Data Visualization Notes

This is a starter RMarkdown template to accompany Data Visualization (Princeton University Press, 2019). You can use it to take notes, write your code, and produce a good-looking, reproducible document that records the work you have done. At the very top of the file is a section of metadata, or information about what the file is and what it does. The metadata is delimited by three dashes at the start and another three at the end. You should change the title, author, and date to the values that suit you. Keep the output line as it is for now, however. Each line in the metadata has a structure. First the key (“title”, “author”, etc), then a colon, and then the value associated with the key.

This is an RMarkdown File

Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. A code chunk is a specially delimited section of the file. You can add one by moving the cursor to a blank line choosing Code > Insert Chunk from the RStudio menu. When you do, an empty chunk will appear:

Code chunks are delimited by three backticks (found to the left of the 1 key on US and UK keyboards) at the start and end. The opening backticks also have a pair of braces and the letter r, to indicate what language the chunk is written in. You write your code inside the code chunks. Write your notes and other material around them, as here.

Before you Begin

To install the tidyverse, make sure you have an Internet connection. Then manually run the code in the chunk below. If you knit the document if will be skipped. We do this because you only need to install these packages once, not every time you run this file. Either knit the chunk using the little green “play” arrow to the right of the chunk area, or copy and paste the text into the console window.

## This code will not be evaluated automatically.
## (Notice the eval = FALSE declaration in the options section of the
## code chunk)

my_packages <- c("tidyverse", "broom", "coefplot", "colorblindr" "cowplot",
                 "dichromat" "gapminder", "GGally", "ggrepel", "ggridges",
                 "ggthemes", "gridExtra", "here", "interplot", "margins",
                 "maps", "mapproj", "mapdata", "MASS", "quantreg",
                 "rlang", "scales", "survey", "srvyr", "statebins",
                 "viridis", "viridisLite", "devtools")

install.packages(my_packages, repos = "http://cran.rstudio.com")

devtools::install_github("kjhealy/socviz", "hrbrmstr/hrbrthemes", "kjhealy/myriad")

Set Up Your Project and Load Libraries

To begin we must load some libraries we will be using. If we do not load them, R will not be able to find the functions contained in these libraries. The tidyverse includes ggplot and other tools. We also load the socviz and gapminder libraries.

Notice that here, the braces at the start of the code chunk have some additional options set in them. There is the language, r, as before. This is required. Then there is the word setup, which is a label for your code chunk. Labels are useful to briefly say what the chunk does. Label names must be unique (no two chunks in the same document can have the same label) and cannot contain spaces. Then, after the comma, an option is set: include=FALSE. This tells R to run this code but not to include the output in the final document.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

gapminder
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ... with 1,694 more rows

The remainder of this document contains the chapter headings for the book, and an empty code chunk in each section to get you started. Try knitting this document now by clicking the “Knit” button in the RStudio toolbar, or choosing File > Knit Document from the RStudio menu.

Look at Data

my_numbers <- c(1, 1, 4, 1, 1, 4, 1)
my_numbers
## [1] 1 1 4 1 1 4 1
4 + 1
## [1] 5
letters
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"

Get Started

my_numbers <- c(1, 2, 3, 1, 3, 5, 25)
your_numbers <- c(5, 31, 71, 1, 3, 21, 6)

my_numbers
## [1]  1  2  3  1  3  5 25
mean(x = my_numbers)
## [1] 5.714286
mean(x = your_numbers)
## [1] 19.71429
mean(my_numbers)
## [1] 5.714286
my_summary <- summary(my_numbers)
my_summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.500   3.000   5.714   4.000  25.000
table(my_numbers)
## my_numbers
##  1  2  3  5 25 
##  2  1  2  1  1
sd(my_numbers)
## [1] 8.616153
my_numbers*5
## [1]   5  10  15   5  15  25 125
my_numbers + 1
## [1]  2  3  4  2  4  6 26
my_numbers + your_numbers
## [1]  6 33 74  2  6 26 31
class(my_numbers)
## [1] "numeric"
class(my_summary)
## [1] "summaryDefault" "table"
class(summary)
## [1] "function"
my_new_vector <- c(my_numbers, "Apple")
my_new_vector
## [1] "1"     "2"     "3"     "1"     "3"     "5"     "25"    "Apple"
class(my_new_vector)
## [1] "character"
titanic
##       fate    sex    n percent
## 1 perished   male 1364    62.0
## 2 perished female  126     5.7
## 3 survived   male  367    16.7
## 4 survived female  344    15.6
class(titanic)
## [1] "data.frame"
titanic$percent
## [1] 62.0  5.7 16.7 15.6
titanic_tb <- as_tibble(titanic)
titanic_tb
## # A tibble: 4 x 4
##   fate     sex        n percent
##   <fct>    <fct>  <dbl>   <dbl>
## 1 perished male    1364    62  
## 2 perished female   126     5.7
## 3 survived male     367    16.7
## 4 survived female   344    15.6
str(my_numbers)
##  num [1:7] 1 2 3 1 3 5 25
str(my_summary)
##  'summaryDefault' Named num [1:6] 1 1.5 3 5.71 4 ...
##  - attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...
url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"
organs <- read_csv(file = url)
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   country = col_character(),
##   world = col_character(),
##   opt = col_character(),
##   consent.law = col_character(),
##   consent.practice = col_character(),
##   consistent = col_character(),
##   ccode = col_character()
## )
## i Use `spec()` for the full column specifications.

Make a Plot

library(gapminder)
gapminder
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ... with 1,694 more rows
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point()

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method = "gam") + scale_x_log10()
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method = "lm") +
    scale_x_log10(labels = scales::dollar)
## `geom_smooth()` using formula 'y ~ x'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = "purple"))

p + geom_point() + 
    geom_smooth(method = "loess") +
    scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))

p + geom_point(color = "purple") + 
    geom_smooth(method = "loess") + 
    scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))

p + geom_point(alpha = 0.3) + 
    geom_smooth(color = "orange", se = FALSE, size = 2, method = "lm") +
    scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))

p + geom_point(alpha = 0.3) + 
    geom_smooth(method = "gam") +
    scale_x_log10(labels = scales::dollar) +
    labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapiminder.")
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent))

p + geom_point() + 
    geom_smooth(method = "loess") + 
    scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent, fill = continent))

p + geom_point() + 
    geom_smooth(method = "loess") +
    scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))

p + geom_point(mapping = aes(color = continent)) + 
    geom_smooth(method = "loess") +
    scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(mapping = aes(color = log(pop))) + scale_x_log10()

p_out <- p + geom_point() + geom_smooth(method = "loess") + scale_x_log10()
ggsave(filename = "my_figure.png", plot = p_out)
## Saving 12 x 9 in image
## `geom_smooth()` using formula 'y ~ x'
here()
## [1] "C:/Users/mdabrowski/Documents/R/dataVisualization"
ggsave(here("figures", "life_vs_gdp_gradient.pdf"), plot = p_out, 
       height = 8, width = 10, units = "in")
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

p + geom_smooth() + geom_point()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

p <- ggplot(data = gapminder, mapping = aes(x = pop, y = lifeExp))
p + geom_point()

p + geom_point() + scale_x_log10()

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + scale_x_sqrt()

p + geom_point() + scale_x_reverse()

p + geom_point() + scale_y_log10()

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = year))
p + geom_point()

gapminder
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ... with 1,694 more rows
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = factor(year)))
p + geom_point()

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = factor(year)))

p + geom_point() + 
    geom_smooth(se = FALSE, method = "lm") +
    scale_x_log10(labels = scales::dollar) +
    labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapiminder.")
## `geom_smooth()` using formula 'y ~ x'

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = country))

p + geom_point() + 
    geom_smooth(se = FALSE, method = "lm") +
    scale_x_log10(labels = scales::dollar) +
    labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: Gapiminder.") +
    theme(legend.position = "bottom")
## `geom_smooth()` using formula 'y ~ x'

Show the Right Numbers

p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line()

p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(aes(group = country))

p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(aes(group = country)) + facet_wrap(~ continent)

p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))

p + geom_line(color = "gray70", aes(group = country)) +
    geom_smooth(size = 1.1, method = "loess", se = FALSE) + 
    scale_y_log10(labels = scales::dollar) +
    facet_wrap(~ continent, ncol = 5) +
    labs(x = "Year", y = "GDP per capita", title = "GDP per capita on Five Continents")
## `geom_smooth()` using formula 'y ~ x'

glimpse(gss_sm)
## Rows: 2,867
## Columns: 32
## $ year        <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2...
## $ id          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
## $ ballot      <labelled> 1, 2, 3, 1, 3, 2, 1, 3, 1, 3, 2, 1, 2, 3, 2, 3, 3,...
## $ age         <dbl> 47, 61, 72, 43, 55, 53, 50, 23, 45, 71, 33, 86, 32, 60,...
## $ childs      <dbl> 3, 0, 2, 4, 2, 2, 2, 3, 3, 4, 5, 4, 3, 5, 7, 2, 6, 5, 0...
## $ sibs        <labelled> 2, 3, 3, 3, 2, 2, 2, 6, 5, 1, 4, 4, 3, 6, 0, 1, 3,...
## $ degree      <fct> Bachelor, High School, Bachelor, High School, Graduate,...
## $ race        <fct> White, White, White, White, White, White, White, Other,...
## $ sex         <fct> Male, Male, Male, Female, Female, Female, Male, Female,...
## $ region      <fct> New England, New England, New England, New England, New...
## $ income16    <fct> $170000 or over, $50000 to 59999, $75000 to $89999, $17...
## $ relig       <fct> None, None, Catholic, Catholic, None, None, None, Catho...
## $ marital     <fct> Married, Never Married, Married, Married, Married, Marr...
## $ padeg       <fct> Graduate, Lt High School, High School, NA, Bachelor, NA...
## $ madeg       <fct> High School, High School, Lt High School, High School, ...
## $ partyid     <fct> "Independent", "Ind,near Dem", "Not Str Republican", "N...
## $ polviews    <fct> Moderate, Liberal, Conservative, Moderate, Slightly Lib...
## $ happy       <fct> Pretty Happy, Pretty Happy, Very Happy, Pretty Happy, V...
## $ partners    <fct> NA, 1 Partner, 1 Partner, NA, 1 Partner, 1 Partner, NA,...
## $ grass       <fct> NA, Legal, Not Legal, NA, Legal, Legal, NA, Not Legal, ...
## $ zodiac      <fct> Aquarius, Scorpio, Pisces, Cancer, Scorpio, Scorpio, Ca...
## $ pres12      <labelled> 3, 1, 2, 2, 1, 1, NA, NA, NA, 2, NA, NA, 1, 1, 2, ...
## $ wtssall     <dbl> 0.9569935, 0.4784968, 0.9569935, 1.9139870, 1.4354903, ...
## $ income_rc   <fct> Gt $170000, Gt $50000, Gt $75000, Gt $170000, Gt $17000...
## $ agegrp      <fct> Age 45-55, Age 55-65, Age 65+, Age 35-45, Age 45-55, Ag...
## $ ageq        <fct> Age 34-49, Age 49-62, Age 62+, Age 34-49, Age 49-62, Ag...
## $ siblings    <fct> 2, 3, 3, 3, 2, 2, 2, 6+, 5, 1, 4, 4, 3, 6+, 0, 1, 3, 6+...
## $ kids        <fct> 3, 0, 2, 4+, 2, 2, 2, 3, 3, 4+, 4+, 4+, 3, 4+, 4+, 2, 4...
## $ religion    <fct> None, None, Catholic, Catholic, None, None, None, Catho...
## $ bigregion   <fct> Northeast, Northeast, Northeast, Northeast, Northeast, ...
## $ partners_rc <fct> NA, 1, 1, NA, 1, 1, NA, 1, NA, 3, 1, NA, 1, NA, 0, 1, 0...
## $ obama       <dbl> 0, 1, 0, 0, 1, 1, NA, NA, NA, 0, NA, NA, 1, 1, 0, 1, 0,...
p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).

p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(sex ~ race + degree)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 62.87
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2.13
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 582.26
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small. fewer
## data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 62.87
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 2.13
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 582.26
## Warning: Removed 18 rows containing missing values (geom_point).
## Warning in max(ids, na.rm = TRUE): brak argumentów w max; zwracanie wartości -
## Inf

p <- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar()

p <- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar(mapping = aes(y = ..prop..))

p <- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar(mapping = aes(y = ..prop.., group = 1))

table(gss_sm$religion)
## 
## Protestant   Catholic     Jewish       None      Other 
##       1371        649         51        619        159
p <- ggplot(data = gss_sm, mapping = aes(x = religion, color = religion))
p + geom_bar()

p <- ggplot(data = gss_sm, mapping = aes(x = religion, fill = religion))
p + geom_bar() + guides(fill = FALSE)

p <- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar()

p <- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position = "fill")

p <- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position = "dodge", mapping = aes(y = ..prop..))

p <- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position = "dodge", mapping = aes(y = ..prop.., group = religion))

p <- ggplot(data = gss_sm, mapping = aes(x = religion))
p + geom_bar(position = "dodge", mapping = aes(y = ..prop.., group = bigregion)) + facet_wrap(~ bigregion, ncol = 2)

p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_histogram(bins = 10)

oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = percollege, fill = state))
p + geom_histogram(alpha = 0.4, bins = 20)

p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_density()

p <- ggplot(data = midwest, mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3)

p <- ggplot(data = midwest, mapping = aes(x = area, color = state))
p + geom_line(stat="density")

p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3, mapping = aes(y = ..scaled..))

p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3, mapping = aes(y = ..count..))

titanic
##       fate    sex    n percent
## 1 perished   male 1364    62.0
## 2 perished female  126     5.7
## 3 survived   male  367    16.7
## 4 survived female  344    15.6
p <- ggplot(data = titanic, mapping = aes(x = fate, y = percent, fill = sex))
p + geom_bar(position = "dodge", stat = "identity") + theme(legend.position = "top")

oecd_sum
## # A tibble: 57 x 5
## # Groups:   year [57]
##     year other   usa  diff hi_lo
##    <int> <dbl> <dbl> <dbl> <chr>
##  1  1960  68.6  69.9 1.3   Below
##  2  1961  69.2  70.4 1.2   Below
##  3  1962  68.9  70.2 1.30  Below
##  4  1963  69.1  70   0.9   Below
##  5  1964  69.5  70.3 0.800 Below
##  6  1965  69.6  70.3 0.7   Below
##  7  1966  69.9  70.3 0.400 Below
##  8  1967  70.1  70.7 0.6   Below
##  9  1968  70.1  70.4 0.3   Below
## 10  1969  70.1  70.6 0.5   Below
## # ... with 47 more rows
p <- ggplot(data = oecd_sum, mapping = aes(x = year, y = diff, fill = hi_lo))

p + geom_col() + guides(fill = FALSE) +
    labs(x = NULL, y = "Difference in Years", title = "The US Life Expectancy Gap",
         subtitle = "Difference between US and OECD average life expectancies, 1960-2015",
         caption = "Data: OECD. After a chart by Christopher Ingraham, Washington Post, December 27th 2017.")
## Warning: Removed 1 rows containing missing values (position_stack).

p <- ggplot(data = oecd_sum, mapping = aes(x = year, y = diff, fill = hi_lo))

p + geom_bar(position = "identity", stat = "identity") + guides(fill = FALSE) +
    labs(x = NULL, y = "Difference in Years", title = "The US Life Expectancy Gap",
         subtitle = "Difference between US and OECD average life expectancies, 1960-2015",
         caption = "Data: OECD. After a chart by Christopher Ingraham, Washington Post, December 27th 2017.")
## Warning: Removed 1 rows containing missing values (geom_bar).

p <- ggplot(data = gapminder, mapping = aes(x = pop, y = gdpPercap))
p + geom_point(mapping = aes(color = continent)) + scale_x_log10() + scale_y_log10(labels = scales::dollar) + facet_wrap(~ year)

p <- ggplot(data = gapminder, mapping = aes(x = year, y = pop))
p_out <- p + geom_line(aes(group = country)) + scale_y_log10() + facet_wrap(~country)

ggsave(here("figures", "pop_vs_year_countries.pdf"), plot = p_out, 
       height = 12, width = 20, units = "in")

p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p_out <- p + geom_line(aes(group = country)) + scale_y_log10() + facet_wrap(~country)

ggsave(here("figures", "dgpPercap_vs_year_countries.pdf"), plot = p_out, 
       height = 12, width = 20, units = "in")
p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).

p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(~ sex + race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).

## Warning: Removed 18 rows containing missing values (geom_point).

p <- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_wrap(~ sex + race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).

## Warning: Removed 18 rows containing missing values (geom_point).

p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_freqpoly(size = 1.5,)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_freqpoly(size = 1.5, bins = 10)

oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = percollege, color = state))
p + geom_freqpoly(alpha = 0.4, size = 1.5, bins = 20)

p <- ggplot(data = gapminder, mapping = aes(x = lifeExp, y = gdpPercap))
p + geom_bin2d(bins = c(20, 50))  # binwidth = 5

p <- ggplot(data = midwest, mapping = aes(x = percollege, y = percbelowpoverty))
p + geom_density_2d()

p <- ggplot(data = midwest, mapping = aes(x = percollege, y = percbelowpoverty))
p + geom_density_2d() + geom_point()

Graph Tables, Make Labels, Add Notes

rel_by_region <- gss_sm %>% 
    group_by(bigregion, religion) %>%
    summarize(N = n()) %>%
    mutate(freq = N / sum(N),
           pct = round(freq*100, 0))
## `summarise()` has grouped output by 'bigregion'. You can override using the `.groups` argument.
rel_by_region
## # A tibble: 24 x 5
## # Groups:   bigregion [4]
##    bigregion religion       N    freq   pct
##    <fct>     <fct>      <int>   <dbl> <dbl>
##  1 Northeast Protestant   158 0.324      32
##  2 Northeast Catholic     162 0.332      33
##  3 Northeast Jewish        27 0.0553      6
##  4 Northeast None         112 0.230      23
##  5 Northeast Other         28 0.0574      6
##  6 Northeast <NA>           1 0.00205     0
##  7 Midwest   Protestant   325 0.468      47
##  8 Midwest   Catholic     172 0.247      25
##  9 Midwest   Jewish         3 0.00432     0
## 10 Midwest   None         157 0.226      23
## # ... with 14 more rows
rel_by_region %>% group_by(bigregion) %>% summarize(total = sum(pct))
## # A tibble: 4 x 2
##   bigregion total
## * <fct>     <dbl>
## 1 Northeast   100
## 2 Midwest     101
## 3 South       100
## 4 West        101
p <- ggplot(rel_by_region, aes(x = bigregion, y = pct, fill = religion))

p + geom_col(position = "dodge") +
    labs(x = "Region", y = "Percent", fill = "Religion") +
    theme(legend.position = "top")

p <- ggplot(rel_by_region, aes(x = bigregion, y = pct, fill = religion))

p + geom_col(position = "dodge2") +
    labs(x = "Region", y = "Percent", fill = "Religion") +
    theme(legend.position = "top")

p <- ggplot(rel_by_region, aes(x = religion, y = pct, fill = religion))

p + geom_col(position = "dodge2") +
    labs(x = NULL, y = "Percent", fill = "Religion") +
    guides(fill = FALSE) +
    coord_flip() +
    facet_grid(~ bigregion)

head(organdata)
## # A tibble: 6 x 21
##   country year       donors   pop pop_dens   gdp gdp_lag health health_lag
##   <chr>   <date>      <dbl> <int>    <dbl> <int>   <int>  <dbl>      <dbl>
## 1 Austra~ NA           NA   17065    0.220 16774   16591   1300       1224
## 2 Austra~ 1991-01-01   12.1 17284    0.223 17171   16774   1379       1300
## 3 Austra~ 1992-01-01   12.4 17495    0.226 17914   17171   1455       1379
## 4 Austra~ 1993-01-01   12.5 17667    0.228 18883   17914   1540       1455
## 5 Austra~ 1994-01-01   10.2 17855    0.231 19849   18883   1626       1540
## 6 Austra~ 1995-01-01   10.2 18072    0.233 21079   19849   1737       1626
## # ... with 12 more variables: pubhealth <dbl>, roads <dbl>, cerebvas <int>,
## #   assault <int>, external <int>, txp_pop <dbl>, world <chr>, opt <chr>,
## #   consent_law <chr>, consent_practice <chr>, consistent <chr>, ccode <chr>
organdata %>% select(1:6) %>% sample_n(size = 10)
## # A tibble: 10 x 6
##    country        year       donors    pop pop_dens   gdp
##    <chr>          <date>      <dbl>  <int>    <dbl> <int>
##  1 Ireland        1991-01-01   19     3534     5.03 13495
##  2 United Kingdom 2000-01-01   13.2  58817    24.2  25271
##  3 Belgium        2001-01-01   22.2  10287    31.1  27113
##  4 Finland        2002-01-01   17.1   5201     1.54 26616
##  5 United Kingdom 1999-01-01   12.1  58635    24.1  24086
##  6 Switzerland    1992-01-01   14.8   6875    16.7  25135
##  7 Ireland        2001-01-01   18.2   3863     5.50 29703
##  8 United States  1992-01-01   17.6 256514     2.66 24411
##  9 Ireland        1997-01-01   20.9   3673     5.23 22017
## 10 Netherlands    1995-01-01   15.2  15459    37.2  21723
p <- ggplot(data = organdata, mapping = aes(x = year, y = donors))
p + geom_point()
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = year, y = donors))
p + geom_line(aes(group = country)) + facet_wrap(~ country)
## Warning: Removed 34 row(s) containing missing values (geom_path).

p <- ggplot(data = organdata, mapping = aes(x = country, y = donors))
p + geom_boxplot()      # outlier.shape, outlier.color
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

p <- ggplot(data = organdata, mapping = aes(x = country, y = donors))
p + geom_boxplot() + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors))
p + geom_boxplot() + labs(x = NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors))
p + geom_violin() + labs(x = NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_ydensity).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, sd, na.rm = TRUE), y = donors))
p + geom_boxplot() + labs(x = NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, median, na.rm = TRUE), y = donors))
p + geom_boxplot() + labs(x = NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, fill = world))
p + geom_boxplot() + labs(x = NULL) + coord_flip() + 
    theme(legend.position = "top")
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_point() + labs(x = NULL) + coord_flip() + 
    theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_jitter() + labs(x = NULL) + coord_flip() + 
    theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_jitter(position = position_jitter(width = 0.15)) + labs(x = NULL) + coord_flip() + 
    theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_jitter(position = position_jitter(height = 0.15)) + labs(x = NULL) + coord_flip() + 
    theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).

glimpse(organdata)
## Rows: 238
## Columns: 21
## $ country          <chr> "Australia", "Australia", "Australia", "Australia"...
## $ year             <date> NA, 1991-01-01, 1992-01-01, 1993-01-01, 1994-01-0...
## $ donors           <dbl> NA, 12.09, 12.35, 12.51, 10.25, 10.18, 10.59, 10.2...
## $ pop              <int> 17065, 17284, 17495, 17667, 17855, 18072, 18311, 1...
## $ pop_dens         <dbl> 0.2204433, 0.2232723, 0.2259980, 0.2282198, 0.2306...
## $ gdp              <int> 16774, 17171, 17914, 18883, 19849, 21079, 21923, 2...
## $ gdp_lag          <int> 16591, 16774, 17171, 17914, 18883, 19849, 21079, 2...
## $ health           <dbl> 1300, 1379, 1455, 1540, 1626, 1737, 1846, 1948, 20...
## $ health_lag       <dbl> 1224, 1300, 1379, 1455, 1540, 1626, 1737, 1846, 19...
## $ pubhealth        <dbl> 4.8, 5.4, 5.4, 5.4, 5.4, 5.5, 5.6, 5.7, 5.9, 6.1, ...
## $ roads            <dbl> 136.59537, 122.25179, 112.83224, 110.54508, 107.98...
## $ cerebvas         <int> 682, 647, 630, 611, 631, 592, 576, 525, 516, 493, ...
## $ assault          <int> 21, 19, 17, 18, 17, 16, 17, 17, 16, 15, 16, 15, 14...
## $ external         <int> 444, 425, 406, 376, 387, 371, 395, 385, 410, 409, ...
## $ txp_pop          <dbl> 0.9375916, 0.9257116, 0.9145470, 0.9056433, 0.8961...
## $ world            <chr> "Liberal", "Liberal", "Liberal", "Liberal", "Liber...
## $ opt              <chr> "In", "In", "In", "In", "In", "In", "In", "In", "I...
## $ consent_law      <chr> "Informed", "Informed", "Informed", "Informed", "I...
## $ consent_practice <chr> "Informed", "Informed", "Informed", "Informed", "I...
## $ consistent       <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "...
## $ ccode            <chr> "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "O...
by_country <- organdata %>% group_by(consent_law, country) %>%
    summarize(donors_mean = mean(donors, na.rm = TRUE),
              donors_sd = sd(donors, na.rm = TRUE),
              gdp_mean = mean(gdp, na.rm = TRUE),
              health_mean = mean(health, na.rm = TRUE),
              roads_mean = mean(roads, na.rm = TRUE),
              cerebvas_mean = mean(cerebvas, na.rm = TRUE))
## `summarise()` has grouped output by 'consent_law'. You can override using the `.groups` argument.
by_country
## # A tibble: 17 x 8
## # Groups:   consent_law [2]
##    consent_law country donors_mean donors_sd gdp_mean health_mean roads_mean
##    <chr>       <chr>         <dbl>     <dbl>    <dbl>       <dbl>      <dbl>
##  1 Informed    Austra~        10.6     1.14    22179.       1958.      105. 
##  2 Informed    Canada         14.0     0.751   23711.       2272.      109. 
##  3 Informed    Denmark        13.1     1.47    23722.       2054.      102. 
##  4 Informed    Germany        13.0     0.611   22163.       2349.      113. 
##  5 Informed    Ireland        19.8     2.48    20824.       1480.      118. 
##  6 Informed    Nether~        13.7     1.55    23013.       1993.       76.1
##  7 Informed    United~        13.5     0.775   21359.       1561.       67.9
##  8 Informed    United~        20.0     1.33    29212.       3988.      155. 
##  9 Presumed    Austria        23.5     2.42    23876.       1875.      150. 
## 10 Presumed    Belgium        21.9     1.94    22500.       1958.      155. 
## 11 Presumed    Finland        18.4     1.53    21019.       1615.       93.6
## 12 Presumed    France         16.8     1.60    22603.       2160.      156. 
## 13 Presumed    Italy          11.1     4.28    21554.       1757       122. 
## 14 Presumed    Norway         15.4     1.11    26448.       2217.       70.0
## 15 Presumed    Spain          28.1     4.96    16933        1289.      161. 
## 16 Presumed    Sweden         13.1     1.75    22415.       1951.       72.3
## 17 Presumed    Switze~        14.2     1.71    27233        2776.       96.4
## # ... with 1 more variable: cerebvas_mean <dbl>
by_year <- organdata %>% group_by(consent_law, year) %>%
    summarize(donors_mean = mean(donors, na.rm = TRUE),
              donors_sd = sd(donors, na.rm = TRUE),
              gdp_mean = mean(gdp, na.rm = TRUE),
              health_mean = mean(health, na.rm = TRUE),
              roads_mean = mean(roads, na.rm = TRUE),
              cerebvas_mean = mean(cerebvas, na.rm = TRUE))
## `summarise()` has grouped output by 'consent_law'. You can override using the `.groups` argument.
by_year
## # A tibble: 26 x 8
## # Groups:   consent_law [2]
##    consent_law year       donors_mean donors_sd gdp_mean health_mean roads_mean
##    <chr>       <date>           <dbl>     <dbl>    <dbl>       <dbl>      <dbl>
##  1 Informed    1991-01-01        14.7      2.58   18160.       1642.      121. 
##  2 Informed    1992-01-01        15.2      2.44   19001.       1752.      113. 
##  3 Informed    1993-01-01        15.0      1.99   19657.       1829.      111. 
##  4 Informed    1994-01-01        14.5      3.49   20724.       1912.      108. 
##  5 Informed    1995-01-01        15.6      4.60   21789.       1996.      110. 
##  6 Informed    1996-01-01        14.6      2.85   22724.       2080.      105. 
##  7 Informed    1997-01-01        14.7      3.82   24084.       2169.      102. 
##  8 Informed    1998-01-01        14.8      4.86   25214.       2273.       97.9
##  9 Informed    1999-01-01        14.1      3.99   26458        2404.       96.2
## 10 Informed    2000-01-01        14.4      3.52   27841.       2532.       95.7
## # ... with 16 more rows, and 1 more variable: cerebvas_mean <dbl>
by_country <- organdata %>% group_by(consent_law, country) %>%
    summarize_if(is.numeric, funs(mean, sd), na.rm = TRUE) %>%
    ungroup()
## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
by_country
## # A tibble: 17 x 28
##    consent_law country donors_mean pop_mean pop_dens_mean gdp_mean gdp_lag_mean
##    <chr>       <chr>         <dbl>    <dbl>         <dbl>    <dbl>        <dbl>
##  1 Informed    Austra~        10.6   18318.         0.237   22179.       21779.
##  2 Informed    Canada         14.0   29608.         0.297   23711.       23353.
##  3 Informed    Denmark        13.1    5257.        12.2     23722.       23275 
##  4 Informed    Germany        13.0   80255.        22.5     22163.       21938.
##  5 Informed    Ireland        19.8    3674.         5.23    20824.       20154.
##  6 Informed    Nether~        13.7   15548.        37.4     23013.       22554.
##  7 Informed    United~        13.5   58187.        24.0     21359.       20962.
##  8 Informed    United~        20.0  269330.         2.80    29212.       28699.
##  9 Presumed    Austria        23.5    7927.         9.45    23876.       23415.
## 10 Presumed    Belgium        21.9   10153.        30.7     22500.       22096.
## 11 Presumed    Finland        18.4    5112.         1.51    21019.       20763 
## 12 Presumed    France         16.8   58056.        10.5     22603.       22211.
## 13 Presumed    Italy          11.1   57360.        19.0     21554.       21195.
## 14 Presumed    Norway         15.4    4386.         1.35    26448.       25769.
## 15 Presumed    Spain          28.1   39666.         7.84    16933        16584.
## 16 Presumed    Sweden         13.1    8789.         1.95    22415.       22094 
## 17 Presumed    Switze~        14.2    7037.        17.0     27233        26931.
## # ... with 21 more variables: health_mean <dbl>, health_lag_mean <dbl>,
## #   pubhealth_mean <dbl>, roads_mean <dbl>, cerebvas_mean <dbl>,
## #   assault_mean <dbl>, external_mean <dbl>, txp_pop_mean <dbl>,
## #   donors_sd <dbl>, pop_sd <dbl>, pop_dens_sd <dbl>, gdp_sd <dbl>,
## #   gdp_lag_sd <dbl>, health_sd <dbl>, health_lag_sd <dbl>, pubhealth_sd <dbl>,
## #   roads_sd <dbl>, cerebvas_sd <dbl>, assault_sd <dbl>, external_sd <dbl>,
## #   txp_pop_sd <dbl>
p <- ggplot(data = by_country, mapping = aes(x = donors_mean, y = reorder(country, donors_mean), color = consent_law))

p + geom_point(size = 3) +
    labs(x = "Donor Procurement Rate", y = NULL, color = "Consent Law") +
    theme(legend.position = "top")

p <- ggplot(data = by_country, mapping = aes(x = donors_mean, y = reorder(country, donors_mean)))

p + geom_point(size = 3) +
    facet_wrap(~ consent_law, scales = "free_y", ncol = 1) +
    labs(x = "Donor Procurement Rate", y = NULL) 

p <- ggplot(data = by_country, mapping = aes(x = reorder(country, donors_mean), y = donors_mean))

p + geom_pointrange(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
    labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()

p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country))

p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country), hjust = 0)

p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country), hjust = 1)

library(ggrepel)

elections_historic %>% select(2:7)
## # A tibble: 49 x 6
##     year winner                 win_party ec_pct popular_pct popular_margin
##    <int> <chr>                  <chr>      <dbl>       <dbl>          <dbl>
##  1  1824 John Quincy Adams      D.-R.      0.322       0.309        -0.104 
##  2  1828 Andrew Jackson         Dem.       0.682       0.559         0.122 
##  3  1832 Andrew Jackson         Dem.       0.766       0.547         0.178 
##  4  1836 Martin Van Buren       Dem.       0.578       0.508         0.142 
##  5  1840 William Henry Harrison Whig       0.796       0.529         0.0605
##  6  1844 James Polk             Dem.       0.618       0.495         0.0145
##  7  1848 Zachary Taylor         Whig       0.562       0.473         0.0479
##  8  1852 Franklin Pierce        Dem.       0.858       0.508         0.0695
##  9  1856 James Buchanan         Dem.       0.588       0.453         0.122 
## 10  1860 Abraham Lincoln        Rep.       0.594       0.396         0.101 
## # ... with 39 more rows
p_title <- "Presidential Elections: Popular & Electoral College Margins"
p_subtitle <- "1824-2016"
p_caption <- "Data for 2016 are provisional."
x_label <- "Winner's share of Popular Vote"
y_label <- "Winner's share of Electoral College Votes"

p <- ggplot(elections_historic, aes(x = popular_pct, y = ec_pct, label = winner_label))

p + geom_hline(yintercept = 0.5, size = 1.4, color = "gray80") +
    geom_vline(xintercept = 0.5, size = 1.4, color = "gray80") +
    geom_abline(color = "gray80") +
    geom_point() +
    geom_text_repel() +
    scale_x_continuous(labels = scales::percent) +
    scale_y_continuous(labels = scales::percent) +
    labs(x = x_label, y = y_label, title = p_title, subtitle = p_subtitle, caption = p_caption)
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

library(ggrepel)

p <- ggplot(data = by_country, mapping = aes(x = gdp_mean, y = health_mean))

p + geom_point() +
    geom_text_repel(data = subset(by_country, gdp_mean > 25000),
                    mapping = aes(label = country))

p <- ggplot(data = by_country, mapping = aes(x = gdp_mean, y = health_mean))

p + geom_point() +
    geom_text_repel(data = subset(by_country, gdp_mean > 25000 | health_mean < 1500 | country %in% "Belgium"),
                    mapping = aes(label = country))

organdata$ind <- organdata$ccode %in% c("Ita", "Spa") & organdata$year > 1998

p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = ind))

p + geom_point() +
    geom_text_repel(data = subset(organdata, ind), mapping = aes(label = ccode)) +
    guides(label = FALSE, color = FALSE)
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors))

p + geom_point(aes(color = ind)) +
    geom_text_repel(data = subset(organdata, ind), mapping = aes(label = ccode)) +
    guides(label = FALSE, color = FALSE)
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors))

p + geom_point() +
    annotate(geom = "text", x = 91, y = 33, label = "A suprisingly high \n recovery rate.", hjust = 0)
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors))

p + geom_point() +
    annotate(geom = "rect", xmin = 125, xmax = 155, ymin = 30, ymax = 35, fill = "red", alpha = 0.2) +
    annotate(geom = "text", x = 157, y = 33, label = "A suprisingly high \n recovery rate.", hjust = 0)
## Warning: Removed 34 rows containing missing values (geom_point).

# scale_<MAPPING>_<KIND>()

p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point()
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))

p + geom_point() +
    scale_x_log10() +
    scale_y_continuous(breaks = c(5, 15, 25), labels = c("Five", "Fifteen", "Twenty Five"))
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))

p + geom_point() +
    scale_color_discrete(labels = c("Corporatist", "Liberal", "Social Democratic", "Unclassified")) +
    labs(x = "Road Deaths", y = "Donor Procurement", color = "Welfare State")
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))

p + geom_point() +
    labs(x = "Road Deaths", y = "Donor Procurement") +
    guides(color = FALSE)
## Warning: Removed 34 rows containing missing values (geom_point).

library(ggrepel)
glimpse(elections_historic)
## Rows: 49
## Columns: 19
## $ election       <int> 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, ...
## $ year           <int> 1824, 1828, 1832, 1836, 1840, 1844, 1848, 1852, 1856...
## $ winner         <chr> "John Quincy Adams", "Andrew Jackson", "Andrew Jacks...
## $ win_party      <chr> "D.-R.", "Dem.", "Dem.", "Dem.", "Whig", "Dem.", "Wh...
## $ ec_pct         <dbl> 0.3218, 0.6820, 0.7657, 0.5782, 0.7959, 0.6182, 0.56...
## $ popular_pct    <dbl> 0.3092, 0.5593, 0.5474, 0.5079, 0.5287, 0.4954, 0.47...
## $ popular_margin <dbl> -0.1044, 0.1225, 0.1781, 0.1420, 0.0605, 0.0145, 0.0...
## $ votes          <int> 113142, 642806, 702735, 763291, 1275583, 1339570, 13...
## $ margin         <int> -38221, 140839, 228628, 213384, 145938, 39413, 13788...
## $ runner_up      <chr> "Andrew Jackson", "John Quincy Adams", "Henry Clay",...
## $ ru_part        <chr> "D.-R.", "N. R.", "N. R.", "Whig", "Dem.", "Whig", "...
## $ turnout_pct    <dbl> 0.269, 0.573, 0.570, 0.565, 0.803, 0.792, 0.728, 0.6...
## $ winner_lname   <chr> "Adams", "Jackson", "Jackson", "Buren", "Harrison", ...
## $ winner_label   <chr> "Adams 1824", "Jackson 1828", "Jackson 1832", "Buren...
## $ ru_lname       <chr> "Jackson", "Adams", "Clay", "Harrison", "Buren", "Cl...
## $ ru_label       <chr> "Jackson 1824", "Adams 1828", "Clay 1832", "Harrison...
## $ two_term       <lgl> FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE...
## $ ec_votes       <dbl> 84, 178, 219, 170, 234, 170, 163, 254, 174, 180, 212...
## $ ec_denom       <dbl> 261, 261, 286, 294, 294, 275, 290, 296, 296, 303, 23...
p_title <- "Presidential Elections: Popular & Electoral College Margins"
p_subtitle <- "1824-2016"
p_caption <- "Data for 2016 are provisional."
x_label <- "Winner's share of Popular Vote"
y_label <- "Winner's share of Electoral College Votes"

p <- ggplot(elections_historic, aes(x = popular_pct, y = ec_pct, label = winner_label))

p + geom_hline(yintercept = 0.5, size = 1.4, color = "gray80") +
    geom_vline(xintercept = 0.5, size = 1.4, color = "gray80") +
    geom_point(aes(color = ru_part)) +
    geom_text_repel(data = subset(elections_historic, year >= 1992), 
                    mapping = aes(color = ru_part),
                    show.legend = FALSE) +
    annotate(geom = "rect", xmin = 0.3, xmax = 0.5, ymin = 0.5, ymax = 1, fill = "steelblue", alpha = 0.1) +
    scale_x_continuous(labels = scales::percent) +
    scale_y_continuous(labels = scales::percent) +
    labs(x = x_label, y = y_label, title = p_title, subtitle = p_subtitle, caption = p_caption)

p <- ggplot(data = elections_historic, mapping = aes(x = reorder(year, popular_pct), y = popular_pct))

p + geom_point(size = 3, aes(color = ru_part)) +
    labs(x = "Donor Procurement Rate", y = NULL) +
    scale_y_continuous(labels = scales::percent) +
    coord_flip()

life_by_continent <- gapminder %>% filter(year == 2007) %>% group_by(continent) %>% summarize(m = mean(lifeExp))
life_by_continent
## # A tibble: 5 x 2
##   continent     m
## * <fct>     <dbl>
## 1 Africa     54.8
## 2 Americas   73.6
## 3 Asia       70.7
## 4 Europe     77.6
## 5 Oceania    80.7
p <- ggplot(data = life_by_continent, mapping = aes(x = reorder(continent, m), y = m))

p + geom_col() + 
    labs(x = NULL, y = "Life expectancy in years, 2007") +
    coord_flip()

p <- ggplot(data = life_by_continent, mapping = aes(x = reorder(continent, m), y = m))

p + geom_point(size = 5) + 
    labs(x = NULL, y = "Life expectancy in years, 2007") +
    coord_flip()

degree_by_race <- gss_sm %>% group_by(race, degree) %>% summarize(N = n()) %>% mutate(pct = round(N/sum(N)*100, 0))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
degree_by_race
## # A tibble: 18 x 4
## # Groups:   race [3]
##    race  degree             N   pct
##    <fct> <fct>          <int> <dbl>
##  1 White Lt High School   197     9
##  2 White High School     1057    50
##  3 White Junior College   166     8
##  4 White Bachelor         426    20
##  5 White Graduate         250    12
##  6 White <NA>               4     0
##  7 Black Lt High School    60    12
##  8 Black High School      292    60
##  9 Black Junior College    33     7
## 10 Black Bachelor          71    14
## 11 Black Graduate          31     6
## 12 Black <NA>               3     1
## 13 Other Lt High School    71    26
## 14 Other High School      112    40
## 15 Other Junior College    17     6
## 16 Other Bachelor          39    14
## 17 Other Graduate          37    13
## 18 Other <NA>               1     0
degree_by_race %>% group_by(race) %>% summarize(total = sum(pct))
## # A tibble: 3 x 2
##   race  total
## * <fct> <dbl>
## 1 White    99
## 2 Black   100
## 3 Other    99
degree_by_sex <- gss_sm %>% group_by(sex, degree) %>% summarize(N = n()) %>% mutate(pct = round(N/sum(N)*100, 0))
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
degree_by_sex
## # A tibble: 12 x 4
## # Groups:   sex [2]
##    sex    degree             N   pct
##    <fct>  <fct>          <int> <dbl>
##  1 Male   Lt High School   147    12
##  2 Male   High School      662    52
##  3 Male   Junior College    89     7
##  4 Male   Bachelor         243    19
##  5 Male   Graduate         132    10
##  6 Male   <NA>               3     0
##  7 Female Lt High School   181    11
##  8 Female High School      799    50
##  9 Female Junior College   127     8
## 10 Female Bachelor         293    18
## 11 Female Graduate         186    12
## 12 Female <NA>               5     0
degree_by_sex %>% group_by(sex) %>% summarize(total = sum(pct))
## # A tibble: 2 x 2
##   sex    total
## * <fct>  <dbl>
## 1 Male     100
## 2 Female    99
degree_by_region <- gss_sm %>% group_by(region, degree) %>% summarize(N = n()) %>% mutate(pct = round(N/sum(N)*100, 0))
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
degree_by_region
## # A tibble: 49 x 4
## # Groups:   region [9]
##    region          degree             N   pct
##    <fct>           <fct>          <int> <dbl>
##  1 New England     Lt High School    13     7
##  2 New England     High School       77    44
##  3 New England     Junior College    19    11
##  4 New England     Bachelor          39    22
##  5 New England     Graduate          27    15
##  6 Middle Atlantic Lt High School    25     8
##  7 Middle Atlantic High School      160    51
##  8 Middle Atlantic Junior College    21     7
##  9 Middle Atlantic Bachelor          59    19
## 10 Middle Atlantic Graduate          47    15
## # ... with 39 more rows
degree_by_region %>% group_by(region) %>% summarize(total = sum(pct))
## # A tibble: 9 x 2
##   region          total
## * <fct>           <dbl>
## 1 New England        99
## 2 Middle Atlantic   100
## 3 E. Nor. Central   100
## 4 W. Nor. Central   100
## 5 South Atlantic    100
## 6 E. Sou. Central   100
## 7 W. Sou. Central   100
## 8 Mountain          100
## 9 Pacific           100
gss_sm %>% group_by(degree) %>% summarize(mean = mean(childs, na.rm = TRUE), median = median(childs, na.rm = TRUE))
## # A tibble: 6 x 3
##   degree          mean median
## * <fct>          <dbl>  <dbl>
## 1 Lt High School  2.81      3
## 2 High School     1.86      2
## 3 Junior College  1.77      2
## 4 Bachelor        1.45      1
## 5 Graduate        1.52      2
## 6 <NA>            3.6       4
str(gapminder)
## tibble [1,704 x 6] (S3: tbl_df/tbl/data.frame)
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
##  $ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_boxplot()

p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_boxplot() + facet_wrap(~ continent)

p <- ggplot(data = gapminder, mapping = aes(x = year, y = pop, group = year))
p + geom_boxplot() + scale_y_log10() + facet_wrap(~ continent)

life_by_year <- gapminder %>% group_by(continent, year)
life_by_year    # this produce improper result
## # A tibble: 1,704 x 6
## # Groups:   continent, year [60]
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ... with 1,694 more rows
p <- ggplot(data = life_by_year, mapping = aes(x = year, y = lifeExp), group = year)
p + geom_boxplot() + facet_wrap(~ continent)
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?

p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_boxplot(notch = TRUE) + facet_wrap(~ continent)
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.

p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_boxplot(notch = TRUE, varwidth = TRUE)

p <- ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, group = year))
p + geom_violin()

by_country <- organdata %>% group_by(consent_law, country) %>%
    summarize_if(is.numeric, funs(mean, sd), na.rm = TRUE) %>%
    ungroup()

p <- ggplot(data = by_country, mapping = aes(x = reorder(country, donors_mean), y = donors_mean))

p + geom_pointrange(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
    labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()

p + geom_linerange(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
    labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()

p + geom_crossbar(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
    labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()

p + geom_errorbar(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) +
    labs(x = NULL, y = "Donor Procurement Rate") + coord_flip()

Work with Models

p <- ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp))

p + geom_point(alpha = 0.1) +
    geom_smooth(color = "tomato", fill = "tomato", method = MASS::rlm) +
    geom_smooth(color = "steelblue", fill = "steelblue", method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

p + geom_point(alpha = 0.1) +
    geom_smooth(color = "tomato", size = 1.2, method = "lm", formula = y ~ splines::bs(x, 3), se = FALSE)

p + geom_point(alpha = 0.1) +
    geom_quantile(color = "tomato", size = 1.2, method = "rqss", lambda = 1, quantiles = c(0.2, 0.5, 0.85))
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 1)

model_colors <- RColorBrewer::brewer.pal(3, "Set1")
model_colors
## [1] "#E41A1C" "#377EB8" "#4DAF4A"
p0 <- ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp))

p1 <- p0 + geom_point(alpha = 0.2) +
    geom_smooth(method = "lm", aes(color = "OLS", fill = "OLS")) +
    geom_smooth(method = "lm", formula = y ~ splines::bs(x, df = 3), aes(color = "Cubic Spline", fill = "Cubic Spline")) +
    geom_smooth(method = "loess", aes(color = "LOESS", fill = "LOESS")) 

p1 + scale_color_manual(name = "Models", values = model_colors) +
     scale_fill_manual(name = "Models", values = model_colors) +
     theme(legend.position = "top")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

gapminder
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ... with 1,694 more rows
str(gapminder)
## tibble [1,704 x 6] (S3: tbl_df/tbl/data.frame)
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
##  $ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
summary(gapminder)
##         country        continent        year         lifeExp     
##  Afghanistan:  12   Africa  :624   Min.   :1952   Min.   :23.60  
##  Albania    :  12   Americas:300   1st Qu.:1966   1st Qu.:48.20  
##  Algeria    :  12   Asia    :396   Median :1980   Median :60.71  
##  Angola     :  12   Europe  :360   Mean   :1980   Mean   :59.47  
##  Argentina  :  12   Oceania : 24   3rd Qu.:1993   3rd Qu.:70.85  
##  Australia  :  12                  Max.   :2007   Max.   :82.60  
##  (Other)    :1632                                                
##       pop              gdpPercap       
##  Min.   :6.001e+04   Min.   :   241.2  
##  1st Qu.:2.794e+06   1st Qu.:  1202.1  
##  Median :7.024e+06   Median :  3531.8  
##  Mean   :2.960e+07   Mean   :  7215.3  
##  3rd Qu.:1.959e+07   3rd Qu.:  9325.5  
##  Max.   :1.319e+09   Max.   :113523.1  
## 
out <- lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
summary(out)
## 
## Call:
## lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.161  -4.486   0.297   5.110  25.175 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.781e+01  3.395e-01 140.819  < 2e-16 ***
## gdpPercap         4.495e-04  2.346e-05  19.158  < 2e-16 ***
## pop               6.570e-09  1.975e-09   3.326 0.000901 ***
## continentAmericas 1.348e+01  6.000e-01  22.458  < 2e-16 ***
## continentAsia     8.193e+00  5.712e-01  14.342  < 2e-16 ***
## continentEurope   1.747e+01  6.246e-01  27.973  < 2e-16 ***
## continentOceania  1.808e+01  1.782e+00  10.146  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.365 on 1697 degrees of freedom
## Multiple R-squared:  0.5821, Adjusted R-squared:  0.5806 
## F-statistic: 393.9 on 6 and 1697 DF,  p-value: < 2.2e-16
str(out)
## List of 13
##  $ coefficients : Named num [1:7] 4.78e+01 4.50e-04 6.57e-09 1.35e+01 8.19 ...
##   ..- attr(*, "names")= chr [1:7] "(Intercept)" "gdpPercap" "pop" "continentAmericas" ...
##  $ residuals    : Named num [1:1704] -27.6 -26.1 -24.5 -22.4 -20.3 ...
##   ..- attr(*, "names")= chr [1:1704] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:1704] -2455.1 311.1 42.6 101.1 -17.2 ...
##   ..- attr(*, "names")= chr [1:1704] "(Intercept)" "gdpPercap" "pop" "continentAmericas" ...
##  $ rank         : int 7
##  $ fitted.values: Named num [1:1704] 56.4 56.4 56.5 56.5 56.4 ...
##   ..- attr(*, "names")= chr [1:1704] "1" "2" "3" "4" ...
##  $ assign       : int [1:7] 0 1 2 3 3 3 3
##  $ qr           :List of 5
##   ..$ qr   : num [1:1704, 1:7] -41.2795 0.0242 0.0242 0.0242 0.0242 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:1704] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:7] "(Intercept)" "gdpPercap" "pop" "continentAmericas" ...
##   .. ..- attr(*, "assign")= int [1:7] 0 1 2 3 3 3 3
##   .. ..- attr(*, "contrasts")=List of 1
##   .. .. ..$ continent: chr "contr.treatment"
##   ..$ qraux: num [1:7] 1.02 1.02 1 1.01 1.04 ...
##   ..$ pivot: int [1:7] 1 2 3 4 5 6 7
##   ..$ tol  : num 1e-07
##   ..$ rank : int 7
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 1697
##  $ contrasts    :List of 1
##   ..$ continent: chr "contr.treatment"
##  $ xlevels      :List of 1
##   ..$ continent: chr [1:5] "Africa" "Americas" "Asia" "Europe" ...
##  $ call         : language lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
##  $ terms        :Classes 'terms', 'formula'  language lifeExp ~ gdpPercap + pop + continent
##   .. ..- attr(*, "variables")= language list(lifeExp, gdpPercap, pop, continent)
##   .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:4] "lifeExp" "gdpPercap" "pop" "continent"
##   .. .. .. ..$ : chr [1:3] "gdpPercap" "pop" "continent"
##   .. ..- attr(*, "term.labels")= chr [1:3] "gdpPercap" "pop" "continent"
##   .. ..- attr(*, "order")= int [1:3] 1 1 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(lifeExp, gdpPercap, pop, continent)
##   .. ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "numeric" "factor"
##   .. .. ..- attr(*, "names")= chr [1:4] "lifeExp" "gdpPercap" "pop" "continent"
##  $ model        :'data.frame':   1704 obs. of  4 variables:
##   ..$ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
##   ..$ gdpPercap: num [1:1704] 779 821 853 836 740 ...
##   ..$ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##   ..$ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language lifeExp ~ gdpPercap + pop + continent
##   .. .. ..- attr(*, "variables")= language list(lifeExp, gdpPercap, pop, continent)
##   .. .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:4] "lifeExp" "gdpPercap" "pop" "continent"
##   .. .. .. .. ..$ : chr [1:3] "gdpPercap" "pop" "continent"
##   .. .. ..- attr(*, "term.labels")= chr [1:3] "gdpPercap" "pop" "continent"
##   .. .. ..- attr(*, "order")= int [1:3] 1 1 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(lifeExp, gdpPercap, pop, continent)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "numeric" "factor"
##   .. .. .. ..- attr(*, "names")= chr [1:4] "lifeExp" "gdpPercap" "pop" "continent"
##  - attr(*, "class")= chr "lm"
out$coefficients
##       (Intercept)         gdpPercap               pop continentAmericas 
##      4.781408e+01      4.495058e-04      6.569823e-09      1.347594e+01 
##     continentAsia   continentEurope  continentOceania 
##      8.192632e+00      1.747269e+01      1.808330e+01
#out$residuals
#out$fitted.values
out$df.residual
## [1] 1697
#out$model

min_gdp <- min(gapminder$gdpPercap)
max_gdp <- max(gapminder$gdpPercap)
med_pop <- median(gapminder$pop)

pred_df <- expand.grid(gdpPercap = seq(from = min_gdp, to = max_gdp, length.out = 100),
                       pop = med_pop, continent = c("Africa", "Americas", "Asia", "Europe", "Oceania"))

dim(pred_df)
## [1] 500   3
head(pred_df)
##   gdpPercap     pop continent
## 1  241.1659 7023596    Africa
## 2 1385.4282 7023596    Africa
## 3 2529.6905 7023596    Africa
## 4 3673.9528 7023596    Africa
## 5 4818.2150 7023596    Africa
## 6 5962.4773 7023596    Africa
pred_out <- predict(object = out, newdata = pred_df, interval = "predict")
head(pred_out)
##        fit      lwr      upr
## 1 47.96863 31.54775 64.38951
## 2 48.48298 32.06231 64.90365
## 3 48.99733 32.57670 65.41797
## 4 49.51169 33.09092 65.93245
## 5 50.02604 33.60497 66.44711
## 6 50.54039 34.11885 66.96193
pred_df <- cbind(pred_df, pred_out)
head(pred_df)
##   gdpPercap     pop continent      fit      lwr      upr
## 1  241.1659 7023596    Africa 47.96863 31.54775 64.38951
## 2 1385.4282 7023596    Africa 48.48298 32.06231 64.90365
## 3 2529.6905 7023596    Africa 48.99733 32.57670 65.41797
## 4 3673.9528 7023596    Africa 49.51169 33.09092 65.93245
## 5 4818.2150 7023596    Africa 50.02604 33.60497 66.44711
## 6 5962.4773 7023596    Africa 50.54039 34.11885 66.96193
p <- ggplot(data = subset(pred_df, continent %in% c("Europe", "Africa")),
            aes(x = gdpPercap, y = fit, ymin = lwr, ymax = upr,
                color = continent, fill = continent, group = continent))

p + geom_point(data = subset(gapminder, continent %in% c("Europe", "Africa")),
               aes(x = gdpPercap, y = lifeExp, color = continent),
               alpha = 0.5, inherit.aes = FALSE) +
    geom_line() +
    geom_ribbon(alpha = 0.2, color = FALSE) +
    scale_x_log10(labels = scales::dollar)

library(broom)

out_comp <- tidy(out)
out_comp %>% round_df()
## # A tibble: 7 x 5
##   term              estimate std.error statistic p.value
##   <chr>                <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)          47.8      0.34     141.         0
## 2 gdpPercap             0        0         19.2        0
## 3 pop                   0        0          3.33       0
## 4 continentAmericas    13.5      0.6       22.5        0
## 5 continentAsia         8.19     0.570     14.3        0
## 6 continentEurope      17.5      0.62      28.0        0
## 7 continentOceania     18.1      1.78      10.2        0
p <- ggplot(out_comp, mapping = aes(x = term, y = estimate))
p + geom_point() + coord_flip()

out_conf <- tidy(out, conf.int = TRUE)
out_conf %>% round_df()
## # A tibble: 7 x 7
##   term              estimate std.error statistic p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)          47.8      0.34     141.         0    47.2      48.5 
## 2 gdpPercap             0        0         19.2        0     0         0   
## 3 pop                   0        0          3.33       0     0         0   
## 4 continentAmericas    13.5      0.6       22.5        0    12.3      14.6 
## 5 continentAsia         8.19     0.570     14.3        0     7.07      9.31
## 6 continentEurope      17.5      0.62      28.0        0    16.2      18.7 
## 7 continentOceania     18.1      1.78      10.2        0    14.6      21.6
out_conf <- subset(out_conf, term %nin% "(Intercept)")
out_conf$nicelabs <- prefix_strip(out_conf$term, "continent")

p <- ggplot(out_conf, mapping = aes(x = reorder(nicelabs, estimate),
                                    y = estimate, ymin = conf.low, ymax = conf.high))
p + geom_pointrange() + coord_flip() + labs(x = NULL, y = "OLS Estimate")

out_aug <- augment(out)
head(out_aug) %>% round_df()
## # A tibble: 6 x 10
##   lifeExp gdpPercap    pop continent .fitted .resid  .hat .sigma .cooksd
##     <dbl>     <dbl>  <dbl> <fct>       <dbl>  <dbl> <dbl>  <dbl>   <dbl>
## 1    28.8      779. 8.43e6 Asia         56.4  -27.6     0   8.34    0.01
## 2    30.3      821. 9.24e6 Asia         56.4  -26.1     0   8.34    0   
## 3    32        853. 1.03e7 Asia         56.5  -24.5     0   8.35    0   
## 4    34.0      836. 1.15e7 Asia         56.5  -22.4     0   8.35    0   
## 5    36.1      740. 1.31e7 Asia         56.4  -20.3     0   8.35    0   
## 6    38.4      786. 1.49e7 Asia         56.5  -18.0     0   8.36    0   
## # ... with 1 more variable: .std.resid <dbl>
out_aug <- augment(out, data = gapminder)
head(out_aug) %>% round_df()
## # A tibble: 6 x 12
##   country continent  year lifeExp    pop gdpPercap .fitted .resid  .hat .sigma
##   <fct>   <fct>     <dbl>   <dbl>  <dbl>     <dbl>   <dbl>  <dbl> <dbl>  <dbl>
## 1 Afghan~ Asia       1952    28.8 8.43e6      779.    56.4  -27.6     0   8.34
## 2 Afghan~ Asia       1957    30.3 9.24e6      821.    56.4  -26.1     0   8.34
## 3 Afghan~ Asia       1962    32   1.03e7      853.    56.5  -24.5     0   8.35
## 4 Afghan~ Asia       1967    34.0 1.15e7      836.    56.5  -22.4     0   8.35
## 5 Afghan~ Asia       1972    36.1 1.31e7      740.    56.4  -20.3     0   8.35
## 6 Afghan~ Asia       1977    38.4 1.49e7      786.    56.5  -18.0     0   8.36
## # ... with 2 more variables: .cooksd <dbl>, .std.resid <dbl>
p <- ggplot(data = out_aug, mapping = aes(x = .fitted, y = .resid))
p + geom_point()

glance(out) %>% round_df()
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic p.value    df logLik    AIC    BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
## 1     0.580         0.580  8.37      394.       0     6 -6034. 12084. 12127.
## # ... with 3 more variables: deviance <dbl>, df.residual <dbl>, nobs <dbl>
library(survival)

out_cph <- coxph(Surv(time, status) ~ age + sex, data = lung)
out_surv <- survfit(out_cph)

summary(out_cph)
## Call:
## coxph(formula = Surv(time, status) ~ age + sex, data = lung)
## 
##   n= 228, number of events= 165 
## 
##          coef exp(coef)  se(coef)      z Pr(>|z|)   
## age  0.017045  1.017191  0.009223  1.848  0.06459 . 
## sex -0.513219  0.598566  0.167458 -3.065  0.00218 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##     exp(coef) exp(-coef) lower .95 upper .95
## age    1.0172     0.9831    0.9990    1.0357
## sex    0.5986     1.6707    0.4311    0.8311
## 
## Concordance= 0.603  (se = 0.025 )
## Likelihood ratio test= 14.12  on 2 df,   p=9e-04
## Wald test            = 13.47  on 2 df,   p=0.001
## Score (logrank) test = 13.72  on 2 df,   p=0.001
#summary(out_surv)

out_tidy <- tidy(out_surv)

p <- ggplot(data = out_tidy, mapping = aes(time, estimate))
p + geom_line() + geom_ribbon(mapping = aes(ymin = conf.low, ymax = conf.high), alpha = 0.2)

eu77 <- gapminder %>% filter(continent == "Europe", year == 1977)
fit <- lm(lifeExp ~ log(gdpPercap), data = eu77)
summary(fit)
## 
## Call:
## lm(formula = lifeExp ~ log(gdpPercap), data = eu77)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4956 -1.0306  0.0935  1.1755  3.7125 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      29.489      7.161   4.118 0.000306 ***
## log(gdpPercap)    4.488      0.756   5.936 2.17e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.114 on 28 degrees of freedom
## Multiple R-squared:  0.5572, Adjusted R-squared:  0.5414 
## F-statistic: 35.24 on 1 and 28 DF,  p-value: 2.173e-06
out_le <- gapminder %>% group_by(continent, year) %>% nest()
out_le
## # A tibble: 60 x 3
## # Groups:   continent, year [60]
##    continent  year data             
##    <fct>     <int> <list>           
##  1 Asia       1952 <tibble [33 x 4]>
##  2 Asia       1957 <tibble [33 x 4]>
##  3 Asia       1962 <tibble [33 x 4]>
##  4 Asia       1967 <tibble [33 x 4]>
##  5 Asia       1972 <tibble [33 x 4]>
##  6 Asia       1977 <tibble [33 x 4]>
##  7 Asia       1982 <tibble [33 x 4]>
##  8 Asia       1987 <tibble [33 x 4]>
##  9 Asia       1992 <tibble [33 x 4]>
## 10 Asia       1997 <tibble [33 x 4]>
## # ... with 50 more rows
out_le %>% filter(continent == "Europe" & year == 1977) %>% unnest()
## Warning: `cols` is now required when using unnest().
## Please use `cols = c(data)`
## # A tibble: 30 x 6
## # Groups:   continent, year [1]
##    continent  year country                lifeExp      pop gdpPercap
##    <fct>     <int> <fct>                    <dbl>    <int>     <dbl>
##  1 Europe     1977 Albania                   68.9  2509048     3533.
##  2 Europe     1977 Austria                   72.2  7568430    19749.
##  3 Europe     1977 Belgium                   72.8  9821800    19118.
##  4 Europe     1977 Bosnia and Herzegovina    69.9  4086000     3528.
##  5 Europe     1977 Bulgaria                  70.8  8797022     7612.
##  6 Europe     1977 Croatia                   70.6  4318673    11305.
##  7 Europe     1977 Czech Republic            70.7 10161915    14800.
##  8 Europe     1977 Denmark                   74.7  5088419    20423.
##  9 Europe     1977 Finland                   72.5  4738902    15605.
## 10 Europe     1977 France                    73.8 53165019    18293.
## # ... with 20 more rows
fit_ols <- function(df) {
    lm(lifeExp ~ log(gdpPercap), data = df)
}

fit_ols
## function(df) {
##     lm(lifeExp ~ log(gdpPercap), data = df)
## }
fit_ols(df = gapminder)
## 
## Call:
## lm(formula = lifeExp ~ log(gdpPercap), data = df)
## 
## Coefficients:
##    (Intercept)  log(gdpPercap)  
##         -9.101           8.405
summary(fit_ols(gapminder))
## 
## Call:
## lm(formula = lifeExp ~ log(gdpPercap), data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.778  -4.204   1.212   4.658  19.285 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -9.1009     1.2277  -7.413 1.93e-13 ***
## log(gdpPercap)   8.4051     0.1488  56.500  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.62 on 1702 degrees of freedom
## Multiple R-squared:  0.6522, Adjusted R-squared:  0.652 
## F-statistic:  3192 on 1 and 1702 DF,  p-value: < 2.2e-16
out_le <- gapminder %>% group_by(continent, year) %>% nest() %>%
    mutate(model = map(data, fit_ols))
        
out_le
## # A tibble: 60 x 4
## # Groups:   continent, year [60]
##    continent  year data              model 
##    <fct>     <int> <list>            <list>
##  1 Asia       1952 <tibble [33 x 4]> <lm>  
##  2 Asia       1957 <tibble [33 x 4]> <lm>  
##  3 Asia       1962 <tibble [33 x 4]> <lm>  
##  4 Asia       1967 <tibble [33 x 4]> <lm>  
##  5 Asia       1972 <tibble [33 x 4]> <lm>  
##  6 Asia       1977 <tibble [33 x 4]> <lm>  
##  7 Asia       1982 <tibble [33 x 4]> <lm>  
##  8 Asia       1987 <tibble [33 x 4]> <lm>  
##  9 Asia       1992 <tibble [33 x 4]> <lm>  
## 10 Asia       1997 <tibble [33 x 4]> <lm>  
## # ... with 50 more rows
out_tidy <- gapminder %>% group_by(continent, year) %>% nest() %>%
    mutate(model = map(data, fit_ols), tidied = map(model, tidy)) %>%
    select(!(data:model)) %>%
    unnest(tidied) %>%
    filter(term %nin% "(Intercept)" & continent %nin% "Oceania") %>%
    ungroup()

out_tidy %>% sample_n(5)
## # A tibble: 5 x 7
##   continent  year term           estimate std.error statistic       p.value
##   <fct>     <int> <chr>             <dbl>     <dbl>     <dbl>         <dbl>
## 1 Asia       1957 log(gdpPercap)     4.17     1.28       3.26 0.00271      
## 2 Asia       2007 log(gdpPercap)     5.16     0.694      7.43 0.0000000226 
## 3 Asia       1992 log(gdpPercap)     5.09     0.649      7.84 0.00000000760
## 4 Asia       1997 log(gdpPercap)     5.11     0.628      8.15 0.00000000335
## 5 Europe     1977 log(gdpPercap)     4.49     0.756      5.94 0.00000217
p <- ggplot(data = out_tidy, mapping = aes(x = year, y = estimate,
                                           ymin = estimate - 2*std.error, ymax = estimate + 2*std.error,
                                           group = continent, color = continent))

p + geom_pointrange(position = position_dodge(width = 1)) +
    scale_x_continuous(breaks = unique(gapminder$year)) +
    theme(legend.position = "top") +
    labs(x = "Year", y = "Estimate", color = "Continent")

library(margins)

gss_sm$polviews_m <- relevel(gss_sm$polviews, ref = "Moderate")
out_bo <- glm(obama ~ polviews_m + sex*race, family = "binomial", data = gss_sm)
summary(out_bo)
## 
## Call:
## glm(formula = obama ~ polviews_m + sex * race, family = "binomial", 
##     data = gss_sm)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9045  -0.5541   0.1772   0.5418   2.2437  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                       0.296493   0.134091   2.211  0.02703 *  
## polviews_mExtremely Liberal       2.372950   0.525045   4.520 6.20e-06 ***
## polviews_mLiberal                 2.600031   0.356666   7.290 3.10e-13 ***
## polviews_mSlightly Liberal        1.293172   0.248435   5.205 1.94e-07 ***
## polviews_mSlightly Conservative  -1.355277   0.181291  -7.476 7.68e-14 ***
## polviews_mConservative           -2.347463   0.200384 -11.715  < 2e-16 ***
## polviews_mExtremely Conservative -2.727384   0.387210  -7.044 1.87e-12 ***
## sexFemale                         0.254866   0.145370   1.753  0.07956 .  
## raceBlack                         3.849526   0.501319   7.679 1.61e-14 ***
## raceOther                        -0.002143   0.435763  -0.005  0.99608    
## sexFemale:raceBlack              -0.197506   0.660066  -0.299  0.76477    
## sexFemale:raceOther               1.574829   0.587657   2.680  0.00737 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2247.9  on 1697  degrees of freedom
## Residual deviance: 1345.9  on 1686  degrees of freedom
##   (1169 observations deleted due to missingness)
## AIC: 1369.9
## 
## Number of Fisher Scoring iterations: 6
bo_m <- margins(out_bo)
summary(bo_m)
##                            factor     AME     SE        z      p   lower
##            polviews_mConservative -0.4119 0.0283 -14.5394 0.0000 -0.4674
##  polviews_mExtremely Conservative -0.4538 0.0420 -10.7971 0.0000 -0.5361
##       polviews_mExtremely Liberal  0.2681 0.0295   9.0996 0.0000  0.2103
##                 polviews_mLiberal  0.2768 0.0229  12.0736 0.0000  0.2319
##   polviews_mSlightly Conservative -0.2658 0.0330  -8.0596 0.0000 -0.3304
##        polviews_mSlightly Liberal  0.1933 0.0303   6.3896 0.0000  0.1340
##                         raceBlack  0.4032 0.0173  23.3568 0.0000  0.3694
##                         raceOther  0.1247 0.0386   3.2297 0.0012  0.0490
##                         sexFemale  0.0443 0.0177   2.5073 0.0122  0.0097
##    upper
##  -0.3564
##  -0.3714
##   0.3258
##   0.3218
##  -0.2011
##   0.2526
##   0.4371
##   0.2005
##   0.0789
plot(bo_m)

bo_gg <- as_tibble(summary(bo_m))
prefixes <- c("polviews_m", "sex")
bo_gg$factor <- prefix_strip(bo_gg$factor, prefixes)
bo_gg$factor <- prefix_replace(bo_gg$factor, "race", "Race: ")

bo_gg %>% select(factor, AME, lower, upper)
## # A tibble: 9 x 4
##   factor                     AME    lower   upper
##   <chr>                    <dbl>    <dbl>   <dbl>
## 1 Conservative           -0.412  -0.467   -0.356 
## 2 Extremely Conservative -0.454  -0.536   -0.371 
## 3 Extremely Liberal       0.268   0.210    0.326 
## 4 Liberal                 0.277   0.232    0.322 
## 5 Slightly Conservative  -0.266  -0.330   -0.201 
## 6 Slightly Liberal        0.193   0.134    0.253 
## 7 Race: Black             0.403   0.369    0.437 
## 8 Race: Other             0.125   0.0490   0.200 
## 9 Female                  0.0443  0.00967  0.0789
p <- ggplot(data = bo_gg, aes(x = reorder(factor, AME),
                              y = AME, ymin = lower, ymax = upper))
p + geom_hline(yintercept = 0, color = "gray80") +
    geom_pointrange() + coord_flip() +
    labs(x = NULL, y = "Average Marginal Effect")

cplot(out_bo, "sex")

pv_cp <- cplot(out_bo, x = "sex", draw = FALSE)

p <- ggplot(data = pv_cp, aes(x = reorder(xvals, yvals),
                              y = yvals, ymin = lower, ymax = upper))
p + geom_hline(yintercept = 0, color = "gray80") +
    geom_pointrange() + coord_flip() +
    labs(x = NULL, y = "Conditional Effect")

library(survey)
## Loading required package: grid
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
library(srvyr)
## 
## Attaching package: 'srvyr'
## The following object is masked from 'package:stats':
## 
##     filter
options(survey.lonely.psu = "adjust")
options(na.action = "na.pass")

gss_wt <- subset(gss_lon, year > 1974) %>%
    mutate(stratvar = interaction(year, vstrat)) %>%
    as_survey_design(ids = vpsu, strata = stratvar, weights = wtssall, nest = TRUE)

out_grp <- gss_wt %>%
    filter(year %in% seq(1976, 2016, by = 4), !is.na(degree)) %>%
    group_by(year, race, degree) %>%
    summarize(prop = survey_mean(na.rm = TRUE))

out_grp
## # A tibble: 146 x 5
## # Groups:   year, race [30]
##     year race  degree           prop prop_se
##    <dbl> <fct> <fct>           <dbl>   <dbl>
##  1  1976 White Lt High School 0.328  0.0160 
##  2  1976 White High School    0.518  0.0162 
##  3  1976 White Junior College 0.0129 0.00298
##  4  1976 White Bachelor       0.101  0.00960
##  5  1976 White Graduate       0.0393 0.00644
##  6  1976 Black Lt High School 0.562  0.0611 
##  7  1976 Black High School    0.337  0.0476 
##  8  1976 Black Junior College 0.0426 0.0193 
##  9  1976 Black Bachelor       0.0581 0.0239 
## 10  1976 Other Lt High School 0.250  0.152  
## # ... with 136 more rows
out_mrg <- gss_wt %>%
    filter(year %in% seq(1976, 2016, by = 4)) %>%
    mutate(racedeg = interaction(race, degree)) %>%
    group_by(year, racedeg) %>%
    summarize(prop = survey_mean(na.rm = TRUE))

out_mrg
## # A tibble: 155 x 4
## # Groups:   year [10]
##     year racedeg                 prop prop_se
##    <dbl> <fct>                  <dbl>   <dbl>
##  1  1976 White.Lt High School 0.297   0.0146 
##  2  1976 Black.Lt High School 0.0470  0.00837
##  3  1976 Other.Lt High School 0.00194 0.00138
##  4  1976 White.High School    0.469   0.0159 
##  5  1976 Black.High School    0.0282  0.00593
##  6  1976 Other.High School    0.00324 0.00166
##  7  1976 White.Junior College 0.0117  0.00268
##  8  1976 Black.Junior College 0.00356 0.00162
##  9  1976 White.Bachelor       0.0916  0.00883
## 10  1976 Black.Bachelor       0.00486 0.00213
## # ... with 145 more rows
out_mrg <- gss_wt %>%
    filter(year %in% seq(1976, 2016, by = 4)) %>%
    mutate(racedeg = interaction(race, degree)) %>%
    group_by(year, racedeg) %>%
    summarize(prop = survey_mean(na.rm = TRUE)) %>%
    separate(racedeg, sep = "\\.", into = c("race", "degree"))

out_mrg
## # A tibble: 155 x 5
## # Groups:   year [10]
##     year race  degree            prop prop_se
##    <dbl> <chr> <chr>            <dbl>   <dbl>
##  1  1976 White Lt High School 0.297   0.0146 
##  2  1976 Black Lt High School 0.0470  0.00837
##  3  1976 Other Lt High School 0.00194 0.00138
##  4  1976 White High School    0.469   0.0159 
##  5  1976 Black High School    0.0282  0.00593
##  6  1976 Other High School    0.00324 0.00166
##  7  1976 White Junior College 0.0117  0.00268
##  8  1976 Black Junior College 0.00356 0.00162
##  9  1976 White Bachelor       0.0916  0.00883
## 10  1976 Black Bachelor       0.00486 0.00213
## # ... with 145 more rows
p <- ggplot(data = subset(out_grp, race %nin% "Other"),
            mapping = aes(x = degree, y = prop, ymin = prop - 2*prop_se, ymax = prop + 2*prop_se,
                          fill = race, color = race))

dodge <- position_dodge(width = 0.9)

p + geom_col(position = dodge, alpha = 0.2) +
    geom_errorbar(position = dodge, width = 0.2) +
    scale_x_discrete(labels = scales::wrap_format(10)) +
    scale_y_continuous(labels = scales::percent) +
    scale_color_brewer(type = "qual", palette = "Dark2") +
    scale_fill_brewer(type = "qual", palette = "Dark2") +
    labs(title = "Educational Attainment by Race",
         subtitle = "GSS 1976-2016",
         fill = "Race", color = "Race", x = NULL, y = "Percent") +
    facet_wrap(~ year, ncol = 2) +
    theme(legend.position = "top")

p <- ggplot(data = subset(out_grp, race %nin% "Other"),
            mapping = aes(x = year, y = prop, ymin = prop - 2*prop_se, ymax = prop + 2*prop_se,
                          fill = race, color = race))

p + geom_ribbon(alpha = 0.3, aes(color = NULL)) +
    geom_line() +
    scale_y_continuous(labels = scales::percent) +
    scale_color_brewer(type = "qual", palette = "Dark2") +
    scale_fill_brewer(type = "qual", palette = "Dark2") +
    labs(title = "Educational Attainment by Race",
         subtitle = "GSS 1976-2016",
         fill = "Race", color = "Race", x = NULL, y = "Percent") +
    facet_wrap(~ degree, ncol = 1) +
    theme(legend.position = "top")

out <- lm(formula = lifeExp ~ log(gdpPercap) + pop + continent, data = gapminder)
plot(out, which = c(1, 2), ask = FALSE)

#install.packages("ggfortify")   # for using autoplot()

library(coefplot)

out <- lm(formula = lifeExp ~ log(gdpPercap) + log(pop) + continent, data = gapminder)
coefplot(out, sort = "magnitude", intercept = FALSE)

#install.packages("infer")

library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
organdata_sm <- organdata %>% select(donors, pop_dens, pubhealth, roads, consent_law)

ggpairs(data = organdata_sm, mapping = aes(color = consent_law),
        upper = list(continuous = wrap("density"), combo = wrap("box_no_facet")),
        lower = list(continuous = wrap("points"), combo = wrap("dot_no_facet")))
## Warning: Removed 34 rows containing non-finite values (stat_density).
## Warning: Removed 34 rows containing non-finite values (stat_density2d).
## Warning: Removed 37 rows containing non-finite values (stat_density2d).
## Warning: Removed 34 rows containing non-finite values (stat_density2d).
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
## Warning: Removed 34 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing non-finite values (stat_density).
## Warning: Removed 21 rows containing non-finite values (stat_density2d).
## Warning: Removed 17 rows containing non-finite values (stat_density2d).
## Warning: Removed 17 rows containing non-finite values (stat_boxplot).
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing non-finite values (stat_density).
## Warning: Removed 21 rows containing non-finite values (stat_density2d).
## Warning: Removed 21 rows containing non-finite values (stat_boxplot).
## Warning: Removed 34 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing non-finite values (stat_density).
## Warning: Removed 17 rows containing non-finite values (stat_boxplot).
## Warning: Removed 34 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).

Draw Maps

election %>% select(state, total_vote, r_points, pct_trump, party, census) %>%
    sample_n(5)
## # A tibble: 5 x 6
##   state        total_vote r_points pct_trump party      census   
##   <chr>             <dbl>    <dbl>     <dbl> <chr>      <chr>    
## 1 South Dakota     370093    29.8       61.5 Republican Midwest  
## 2 Maine            747927    -2.96      44.9 Democratic Northeast
## 3 Colorado        2780247    -4.91      43.2 Democratic West     
## 4 Indiana         2757965    19.0       56.5 Republican Midwest  
## 5 Nevada          1125385    -2.42      45.5 Democratic West
# Hex color codes for Dem Blue and Rep Red
party_colors <- c("#2E74C0", "#CB454A")

p0 <- ggplot(data = subset(election, st %nin% "DC"),
             mapping = aes(x = r_points, y = reorder(state, r_points), color = party))

p1 <- p0 + geom_vline(xintercept = 0, color = "gray30") +
    geom_point(size = 2)

p2 <- p1 + scale_color_manual(values = party_colors)

p3 <- p2 + scale_x_continuous(breaks = c(-30, -20, -10, 0, 10, 20, 30, 40),
                              labels = c("30\n(Clinton)", "20", "10", "0", "10", "20", "30", "40\n(Trump)"))

p3 + facet_wrap(~ census, ncol = 1, scales = "free_y") +
    guides(color = FALSE) + labs(x = "Point Margin", y = NULL) +
    theme(axis.text = element_text(size = 8))

library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
us_states <- map_data("state")
head(us_states)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>
dim(us_states)
## [1] 15537     6
p <- ggplot(data = us_states, mapping = aes(x = long, y = lat, group = group))
p + geom_polygon(fill = "white", color = "black")

p <- ggplot(data = us_states, mapping = aes(x = long, y = lat, group = group, fill = region))
p + geom_polygon(color = "gray90", size = 0.1) + guides(fill = FALSE)

p <- ggplot(data = us_states, mapping = aes(x = long, y = lat, group = group, fill = region))
p + geom_polygon(color = "gray90", size = 0.1) + 
    coord_map(projection = "albers", lat0 = 39, lat1 = 45) + guides(fill = FALSE)

theme_map <- function(base_size = 9, base_family = NULL) {
    
    require(grid)
    theme_bw(base_size = base_size, base_family = base_family) %+replace%
        theme(axis.line = element_blank(),
              axis.text = element_blank(),
              axis.ticks = element_blank(),
              axis.title = element_blank(),
              panel.background = element_blank(),
              panel.border = element_blank(),
              panel.grid = element_blank(),
              panel.spacing = unit(0, "lines"),
              plot.background = element_blank(),
              legend.justification = c(0,0),
              legend.position = c(0,0))
    
}
election$region <- tolower(election$state)
us_states_elec <- left_join(us_states, election, by = "region") # instead of: merge(us_states, election, sort = FALSE)
head(us_states_elec)
##        long      lat group order  region subregion   state st fips total_vote
## 1 -87.46201 30.38968     1     1 alabama      <NA> Alabama AL    1    2123372
## 2 -87.48493 30.37249     1     2 alabama      <NA> Alabama AL    1    2123372
## 3 -87.52503 30.37249     1     3 alabama      <NA> Alabama AL    1    2123372
## 4 -87.53076 30.33239     1     4 alabama      <NA> Alabama AL    1    2123372
## 5 -87.57087 30.32665     1     5 alabama      <NA> Alabama AL    1    2123372
## 6 -87.58806 30.32665     1     6 alabama      <NA> Alabama AL    1    2123372
##   vote_margin winner      party pct_margin r_points d_points pct_clinton
## 1      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 2      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 3      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 4      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 5      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 6      588708  Trump Republican     0.2773    27.72   -27.72       34.36
##   pct_trump pct_johnson pct_other clinton_vote trump_vote johnson_vote
## 1     62.08        2.09      1.46       729547    1318255        44467
## 2     62.08        2.09      1.46       729547    1318255        44467
## 3     62.08        2.09      1.46       729547    1318255        44467
## 4     62.08        2.09      1.46       729547    1318255        44467
## 5     62.08        2.09      1.46       729547    1318255        44467
## 6     62.08        2.09      1.46       729547    1318255        44467
##   other_vote ev_dem ev_rep ev_oth census
## 1      31103      0      9      0  South
## 2      31103      0      9      0  South
## 3      31103      0      9      0  South
## 4      31103      0      9      0  South
## 5      31103      0      9      0  South
## 6      31103      0      9      0  South
p0 <- ggplot(data = us_states_elec, mapping = aes(x = long, y = lat, group = group, fill = party))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) + 
    coord_map(projection = "albers", lat0 = 39, lat1 = 45)

p2 <- p1 + scale_fill_manual(values = party_colors) +
    labs(title = "Election Results 2016", fill = NULL)
p2 + theme_map()

p0 <- ggplot(data = us_states_elec, mapping = aes(x = long, y = lat, group = group, fill = pct_trump))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) + 
    coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p1 + labs(title = "Trump vote") + theme_map() + labs(fill = "Percent")

p2 <- p1 + scale_fill_gradient(low = "white", high = "#CB454A") +
    labs(title = "Election Results 2016", fill = NULL)
p2 + labs(title = "Trump vote") + theme_map() + labs(fill = "Percent")

p0 <- ggplot(data = us_states_elec, mapping = aes(x = long, y = lat, group = group, fill = d_points))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) + 
    coord_map(projection = "albers", lat0 = 39, lat1 = 45)

p2 <- p1 + scale_fill_gradient2() + labs(title = "Winning margins") 
p2 + theme_map() + labs(fill = "Percent")

p3 <- p1 + scale_fill_gradient2(low = "red", mid = scales::muted("purple"), high = "blue",
                                breaks = c(-25, 0, 25, 50, 75)) + 
    labs(title = "Winning margins") 
p3 + theme_map() + labs(fill = "Percent")

p0 <- ggplot(data = subset(us_states_elec, region %nin% "district of columbia"),
             mapping = aes(x = long, y = lat, group = group, fill = d_points))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) + 
    coord_map(projection = "albers", lat0 = 39, lat1 = 45)

p2 <- p1 + scale_fill_gradient2(low = "red", mid = scales::muted("purple"), high = "blue") + 
    labs(title = "Winning margins") 
p2 + theme_map() + labs(fill = "Percent")

county_map %>% sample_n(5)
##         long        lat  order  hole piece            group    id
## 1 -1107851.6   285230.8 107574 FALSE     1 0500000US30063.1 30063
## 2  -871084.6  -680143.1 167197 FALSE     1 0500000US49037.1 49037
## 3  1725102.5 -1157391.9  45145 FALSE     1 0500000US13251.1 13251
## 4   459010.5   408867.1  92669 FALSE     1 0500000US27071.1 27071
## 5   893034.9 -1640496.7  72274 FALSE     1 0500000US22057.1 22057
county_data %>% select(id, name, state, pop_dens, pct_black) %>% sample_n(5)
##      id              name state      pop_dens   pct_black
## 1 31057      Dundy County    NE [    0,   10) [ 0.0, 2.0)
## 2 48339 Montgomery County    TX [  100,  500) [ 2.0, 5.0)
## 3 23009    Hancock County    ME [   10,   50) [ 0.0, 2.0)
## 4 22091 St. Helena Parish    LA [   10,   50) [50.0,85.3]
## 5 05049     Fulton County    AR [   10,   50) [ 0.0, 2.0)
head(county_data)
##      id           name state census_region      pop_dens   pop_dens4
## 1     0           <NA>  <NA>          <NA> [   50,  100) [ 45,  118)
## 2 01000              1    AL         South [   50,  100) [ 45,  118)
## 3 01001 Autauga County    AL         South [   50,  100) [ 45,  118)
## 4 01003 Baldwin County    AL         South [  100,  500) [118,71672]
## 5 01005 Barbour County    AL         South [   10,   50) [ 17,   45)
## 6 01007    Bibb County    AL         South [   10,   50) [ 17,   45)
##     pop_dens6   pct_black       pop female white black travel_time  land_area
## 1 [ 82,  215) [10.0,15.0) 318857056   50.8  77.7  13.2        25.5 3531905.43
## 2 [ 82,  215) [25.0,50.0)   4849377   51.5  69.8  26.6        24.2   50645.33
## 3 [ 82,  215) [15.0,25.0)     55395   51.5  78.1  18.4        26.2     594.44
## 4 [ 82,  215) [ 5.0,10.0)    200111   51.2  87.3   9.5        25.9    1589.78
## 5 [ 25,   45) [25.0,50.0)     26887   46.5  50.2  47.6        24.6     884.88
## 6 [ 25,   45) [15.0,25.0)     22506   46.0  76.3  22.1        27.6     622.58
##   hh_income su_gun4 su_gun6 fips votes_dem_2016 votes_gop_2016 total_votes_2016
## 1     53046    <NA>    <NA>    0             NA             NA               NA
## 2     43253    <NA>    <NA> 1000             NA             NA               NA
## 3     53682 [11,54] [10,12) 1001           5908          18110            24661
## 4     50221 [11,54] [10,12) 1003          18409          72780            94090
## 5     32911 [ 5, 8) [ 7, 8) 1005           4848           5431            10390
## 6     36447 [11,54] [10,12) 1007           1874           6733             8748
##   per_dem_2016 per_gop_2016 diff_2016 per_dem_2012 per_gop_2012 diff_2012
## 1           NA           NA        NA           NA           NA        NA
## 2           NA           NA        NA           NA           NA        NA
## 3    0.2395685    0.7343579     12202    0.2657577    0.7263374     11012
## 4    0.1956531    0.7735147     54371    0.2156657    0.7738975     47443
## 5    0.4666025    0.5227141       583    0.5125229    0.4833755       334
## 6    0.2142204    0.7696616      4859    0.2621857    0.7306638      3931
##   winner partywinner16 winner12 partywinner12 flipped
## 1   <NA>          <NA>     <NA>          <NA>    <NA>
## 2   <NA>          <NA>     <NA>          <NA>    <NA>
## 3  Trump    Republican   Romney    Republican      No
## 4  Trump    Republican   Romney    Republican      No
## 5  Trump    Republican    Obama      Democrat     Yes
## 6  Trump    Republican   Romney    Republican      No
county_full <- left_join(county_map, county_data, by = "id")

p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop_dens, group = group))

p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()

p2 <- p1 + scale_fill_brewer(palette = "Blues",
                             labels = c("0-10", "10-50", "50-100", "100-500", "500-1,000", "1,000-5,000", ">5,000"))
p2 + labs(fill = "Population per\nsquare mile") + theme_map() +
    guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")

p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pct_black, group = group))

p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()

p2 <- p1 + scale_fill_brewer(palette = "Greens")
p2 + labs(fill = "US Population, Percent Black") + theme_map() +
    guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")

orange_pal <- RColorBrewer::brewer.pal(n = 6, name = "Oranges")
orange_pal
## [1] "#FEEDDE" "#FDD0A2" "#FDAE6B" "#FD8D3C" "#E6550D" "#A63603"
orange_rev <- rev(orange_pal)
orange_rev
## [1] "#A63603" "#E6550D" "#FD8D3C" "#FDAE6B" "#FDD0A2" "#FEEDDE"
gun_p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = su_gun6, group = group))

gun_p1 <- gun_p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()

gun_p2 <- gun_p1 + scale_fill_manual(values = orange_pal)
gun_p2 + labs(title = "Gun-Related Suicides, 1999-2015", fill = "Rate per 100,000 pop") + 
    theme_map() + theme(legend.position = "bottom")

pop_p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop_dens6, group = group))

pop_p1 <- pop_p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()

pop_p2 <- pop_p1 + scale_fill_manual(values = orange_rev)
pop_p2 + labs(title = "Reverse-coded Population Density", fill = "People per square mile") + 
    theme_map() + theme(legend.position = "bottom")

library(statebins)

p0 <- ggplot(data = election, mapping = aes(state = state, fill = pct_trump))

p0 + geom_statebins() + coord_equal() +
     scale_fill_distiller(palette = "Reds", direction = 1) +
     labs(fill = "Percent Trump") +
     theme_statebins(legend_position = "top")

p0 <- ggplot(data = subset(election, st %nin% "DC"), mapping = aes(state = state, fill = pct_clinton))

p0 + geom_statebins(lbl_size = 4, light_lbl = "black") + coord_equal() +
     scale_fill_distiller(palette = "Blues", direction = 1) +
     labs(fill = "Percent Clinton") +
     theme_statebins(legend_position = "top")

# Hex color codes for Dem Blue and Rep Red
party_colors <- c("#CB454A", "#2E74C0")

election <- election %>% mutate(color = recode(party, Republican = party_colors[2], Democratic = party_colors[1]))

p0 <- ggplot(data = election, mapping = aes(state = state, fill = color))

p0 + geom_statebins(lbl_size = 4) + coord_equal() +
     scale_fill_manual(values = party_colors, labels = c("Trump", "Clinton")) +
     labs(fill = "Winner") + 
     theme_statebins(legend_position = "right")

election$category <- cut(election$pct_trump, breaks=c(4, 21, 37, 53, 70), 
                         labels = c("4-21", "21-37", "37-53", "53-70"))

model_colors <- RColorBrewer::brewer.pal(n = 4, name = "Reds")

p0 <- ggplot(data = election, mapping = aes(state = state, fill = category))

p0 + geom_statebins(lbl_size = 4) + coord_equal() +
     scale_fill_manual(values = model_colors, labels = c("4-21", "21-37", "37-53", "53-70")) +
     labs(fill = "Percent Trump") +
     theme_statebins(legend_position = "top")

opiates
## # A tibble: 800 x 11
##     year state  fips deaths population crude adjusted adjusted_se region abbr 
##    <int> <chr> <int>  <int>      <int> <dbl>    <dbl>       <dbl> <ord>  <chr>
##  1  1999 Alab~     1     37    4430141   0.8      0.8         0.1 South  AL   
##  2  1999 Alas~     2     27     624779   4.3      4           0.8 West   AK   
##  3  1999 Ariz~     4    229    5023823   4.6      4.7         0.3 West   AZ   
##  4  1999 Arka~     5     28    2651860   1.1      1.1         0.2 South  AR   
##  5  1999 Cali~     6   1474   33499204   4.4      4.5         0.1 West   CA   
##  6  1999 Colo~     8    164    4226018   3.9      3.7         0.3 West   CO   
##  7  1999 Conn~     9    151    3386401   4.5      4.4         0.4 North~ CT   
##  8  1999 Dela~    10     32     774990   4.1      4.1         0.7 South  DE   
##  9  1999 Dist~    11     28     570213   4.9      4.9         0.9 South  DC   
## 10  1999 Flor~    12    402   15759421   2.6      2.6         0.1 South  FL   
## # ... with 790 more rows, and 1 more variable: division_name <chr>
opiates$region <- tolower(opiates$state)
opiates_map <- left_join(us_states, opiates, by = "region")

library(viridis)
## Loading required package: viridisLite
p0 <- ggplot(data = subset(opiates_map, year > 1999),
             mapping = aes(x = long, y = lat, group = group, fill = adjusted))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.05) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45)

p2 <- p1 + scale_fill_viridis_c(option = "plasma")

p2 + theme_map() + facet_wrap(~ year, ncol = 3) +
    theme(legend.position = "bottom", strip.background = element_blank()) +
    labs(fill = "Death rate per 100,000 population",
         title = "Opiate Related Deaths by State, 2000-2014")

opiates_map$category <- cut_interval(opiates_map$adjusted, n = 5)

p0 <- ggplot(data = subset(opiates_map, year > 1999),
             mapping = aes(x = long, y = lat, group = group, fill = category))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.05) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45)

p2 <- p1 + scale_fill_viridis_d(option = "plasma")

p2 + theme_map() + facet_wrap(~ year, ncol = 3) +
    theme(legend.position = "bottom", strip.background = element_blank()) +
    labs(fill = "Death rate per 100,000 population",
         title = "Opiate Related Deaths by State, 2000-2014")

library(ggrepel)

p <- ggplot(data = opiates, mapping = aes(x = year, y = adjusted, group = state))
p + geom_line(color = "gray70")
## Warning: Removed 17 row(s) containing missing values (geom_path).

p0 <- ggplot(data = drop_na(opiates, division_name), mapping = aes(x = year, y = adjusted))

p1 <- p0 + geom_line(color = "gray70", mapping = aes(group = state))

p2 <- p1 + geom_smooth(mapping = aes(group = division_name), se = FALSE)

p3 <- p2 + geom_text_repel(data = subset(opiates, year == max(year) & abbr != "DC"),
                           mapping = aes(x = year, y = adjusted, label = abbr),
                           size = 1.8, segment.color = NA, nudge_x = 30) +
    coord_cartesian(c(min(opiates$year), max(opiates$year)))

p3 + labs(x = NULL, y = "Rate per 100,000 population",
          title = "State-Level Opiate Death Rates by Census Division, 1999-2014") +
    facet_wrap(~ reorder(division_name, - adjusted, na.rm = TRUE), nrow = 3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).

## Warning: Removed 17 row(s) containing missing values (geom_path).

p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = black, group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()

p2 <- p1 + scale_fill_viridis_c(option = "plasma")
p2 + labs(fill = "US Population, Percent Black") + theme_map() +
    guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")

p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop/land_area, group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()

p2 <- p1 + scale_fill_viridis_c(option = "inferno")
p2 + labs(fill = "Population per square mile") + theme_map() +
    guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")

p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop/land_area, group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()

p2 <- p1 + scale_fill_viridis_c(option = "inferno", breaks = c(1, 10, 100, 1000, 1000, 10000, 70000),
                                trans = scales::pseudo_log_trans(sigma = 0.001))
p2 + labs(fill = "Population per square mile") + theme_map() +
    guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")

Refine your Plots

library(ggrepel)
head(asasec)
##                                Section         Sname Beginning Revenues
## 1      Aging and the Life Course (018)         Aging     12752    12104
## 2     Alcohol, Drugs and Tobacco (030) Alcohol/Drugs     11933     1144
## 3 Altruism and Social Solidarity (047)      Altruism      1139     1862
## 4            Animals and Society (042)       Animals       473      820
## 5             Asia/Asian America (024)          Asia      9056     2116
## 6            Body and Embodiment (048)          Body      3408     1618
##   Expenses Ending Journal Year Members
## 1    12007  12849      No 2005     598
## 2      400  12677      No 2005     301
## 3     1875   1126      No 2005      NA
## 4     1116    177      No 2005     209
## 5     1710   9462      No 2005     365
## 6     1920   3106      No 2005      NA
p <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

p <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p + geom_point(mapping = aes(color = Journal)) + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

p0 <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))

p1 <- p0 + geom_point(mapping = aes(color = Journal)) + geom_smooth(method = "lm", se = FALSE, color = "gray80")

p2 <- p1 + geom_text_repel(data = subset(asasec, Year == 2014 & Revenues > 7000), size = 3)

p3 <- p2 + labs(x = "Membership", y = "Revenues", coloe = "Section has own Journal",
                title = "ASA Sections", subtitle = "2014 Calendar year.", caption = "Source: ASA annual report.")

p4 <- p3 + scale_y_continuous(labels = scales::dollar) +
    theme(legend.position = "bottom")
p4
## `geom_smooth()` using formula 'y ~ x'

p <-ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))

p + geom_point(size = 2) + scale_color_brewer(palette = "Set2") +
    theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).

p + geom_point(size = 2) + scale_color_brewer(palette = "Pastel2") +
    theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).

p + geom_point(size = 2) + scale_color_brewer(palette = "Dark2") +
    theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).

#demo('colors')     

cb_palette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

p4 + scale_color_manual(values = cb_palette)    # also possible: scale_color_hue()
## `geom_smooth()` using formula 'y ~ x'

library(dichromat)      # instead: library(colorblindr)

Default <- RColorBrewer::brewer.pal(5, "Set2")

types <- c("deutan", "protan", "tritan")
names(types) <- c("Deuteronopia", "Protanopia", "Tritanopia")

color_table <- types %>% purrr::map(~ dichromat(Default, .x)) %>%
    as_tibble() %>% add_column(Default, .before = TRUE)

color_table
## # A tibble: 5 x 4
##   Default Deuteronopia Protanopia Tritanopia
##   <chr>   <chr>        <chr>      <chr>     
## 1 #66C2A5 #AEAEA7      #BABAA5    #82BDBD   
## 2 #FC8D62 #B6B661      #9E9E63    #F29494   
## 3 #8DA0CB #9C9CCB      #9E9ECB    #92ABAB   
## 4 #E78AC3 #ACACC1      #9898C3    #DA9C9C   
## 5 #A6D854 #CACA5E      #D3D355    #B6C8C8
color_comp(color_table)

# Democrat Blue and Republican Red
party_colors <- c("#2E74C0", "#CB454A")

p0 <- ggplot(data = subset(county_data, flipped == "No"),
             mapping = aes(x = pop, y = black/100))

p1 <- p0 + geom_point(alpha = 0.15, color = "gray50") +
    scale_x_log10(labels = scales::comma)
p1

p2 <- p1 + geom_point(data = subset(county_data, flipped == "Yes"),
                      mapping = aes(x = pop, y = black/100, color = partywinner16)) +
    scale_color_manual(values = party_colors)
p2

p3 <- p2 + scale_y_continuous(labels = scales::percent) +
    labs(color = "County flipped to ...", x = "County Population(log scale)", y = "Percent Black Population",
         title = "Flipped counties, 2016", caption = "Counties in gray did not flip.")
p3

p4 <- p3 + geom_text_repel(data = subset(county_data, flipped == "Yes" & black > 25),
                           mapping = aes(x = pop, y = black/100, label = state), size = 2)
p4 + theme_minimal()  +
    theme(legend.position = "top")

theme_set(theme_bw())       # also: theme_classic(), theme_gray() and default theme_grey()
p4 + theme(legend.position = "top")

theme_set(theme_dark())
p4 + theme(legend.position = "top")

library(ggthemes)  # also: library(cowplot) and library(hrbrthemes)
## 
## Attaching package: 'ggthemes'
## The following object is masked _by_ '.GlobalEnv':
## 
##     theme_map
theme_set(theme_economist())
p4 + theme(legend.position = "top")

theme_set(theme_wsj())
p4 + theme(legend.position = "top")

p4 + theme(plot.title = element_text(size = rel(0.6)),
           legend.title = element_text(size = rel(0.35)),
           plot.caption = element_text(size = rel(0.35)),
           legend.position = "top")

theme_set(theme_grey())
p4 + theme(legend.position = "top")

p4 + theme(legend.position = "top",
           plot.title = element_text(size = rel(2),
                                     lineheight = .5,
                                     family = "Times",
                                     face = "bold.italic",
                                     color = "orange"),
           axis.text.x = element_text(size = rel(1),
                                      family = "Courier",
                                      face = "bold",
                                      color = "purple"))
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

yrs <- c(seq(1972, 1988, 4), 1993, seq(1996, 2016, 4))

mean_age <- gss_lon %>%
    filter(age %nin% NA && year %in% yrs) %>%
    group_by(year) %>%
    summarize(xbar = round(mean(age, na.rm = TRUE), 0))

mean_age$y <- 0.3
yr_labs <- data.frame(x = 85, y = 0.8, year = yrs)

p <- ggplot(data = subset(gss_lon, year %in% yrs),
            mapping = aes(x = age))

p1 <- p + geom_density(fill = "gray20", color = FALSE, alpha = 0.9, mapping = aes(y = ..scaled..)) +
    geom_vline(data = subset(mean_age, year %in% yrs),
               aes(xintercept = xbar), color = "white", size = 0.5) +
    geom_text(data = subset(mean_age, year %in% yrs),
              aes(x = xbar, y = y, label = xbar), nudge_x = 7.5, color = "white", size = 3.5, hjust = 1) +
    geom_text(data = subset(yr_labs, year %in% yrs),
              aes(x = x, y = y, label = year)) +
    facet_grid(year ~ ., switch = "y")
p1
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
## Warning: Removed 83 rows containing non-finite values (stat_density).

library(hrbrthemes)
#library(myriad)

p1 + theme_ipsum(base_size = 10, plot_title_size = 10, # with library(myriad): theme_book, panel_spacing = unit(0.1, "lines")
                strip_text_size = 32) +
    theme(plot.title = element_text(size = 16),
          axis.text.x = element_text(size = 12),
          axis.title.y = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank(),
          strip.background = element_blank(),
          strip.text.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank()) +
    labs(x = "Age", y = NULL, title = "Age Distribution of\nGSS Respondents")
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
## Warning: Removed 83 rows containing non-finite values (stat_density).
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): rodzina
## czcionek nie została znaleziona w bazie czcionek systemu Windows
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## rodzina czcionek nie została znaleziona w bazie czcionek systemu Windows

library(ggridges)

p <- ggplot(data = gss_lon,
            mapping = aes(x = age, y = factor(year, levels = rev(unique(year)), ordered = TRUE)))

p + geom_density_ridges(alpha = 0.6, fill = "lightblue", scale = 1.5) +
    scale_x_continuous(breaks = c(25, 50, 75)) +
    scale_y_discrete(expand = c(0.01, 0)) +
    labs(x = "Age", y = NULL, title = "Age Distribution of\nGSS Respondents") +
    theme_ridges() +
    theme(title = element_text(size = 16, face = "bold"))
## Picking joint bandwidth of 3.49
## Warning: Removed 221 rows containing non-finite values (stat_density_ridges).

head(fredts)
##         date  sp500 monbase  sp500_i monbase_i
## 1 2009-03-11 696.68 1542228 100.0000  100.0000
## 2 2009-03-18 766.73 1693133 110.0548  109.7849
## 3 2009-03-25 799.10 1693133 114.7012  109.7849
## 4 2009-04-01 809.06 1733017 116.1308  112.3710
## 5 2009-04-08 830.61 1733017 119.2240  112.3710
## 6 2009-04-15 852.21 1789878 122.3245  116.0579
fredts_m <- fredts %>% select(date, sp500_i, monbase_i) %>% gather(key = series, value = score, sp500_i:monbase_i)
head(fredts_m)
##         date  series    score
## 1 2009-03-11 sp500_i 100.0000
## 2 2009-03-18 sp500_i 110.0548
## 3 2009-03-25 sp500_i 114.7012
## 4 2009-04-01 sp500_i 116.1308
## 5 2009-04-08 sp500_i 119.2240
## 6 2009-04-15 sp500_i 122.3245
p <- ggplot(data = fredts_m, mapping = aes(x = date, y = score, group = series, color = series))

p1 <- p + geom_line() + theme(legend.position = "top") +
    labs(x = NULL, y = "Index", color = "Series")

p <- ggplot(data = fredts, mapping = aes(x = date, y = sp500_i - monbase_i))

p2 <- p + geom_line() +
    labs(x = "Date", y = "Difference")

cowplot::plot_grid(p1, p2, nrow = 2, rel_heights = c(0.75, 0.25), align = "v") # also: library(gridExtra) with grid.arrange()
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

head(yahoo)
## # A tibble: 6 x 4
##    Year Revenue Employees Mayer
##   <dbl>   <dbl>     <dbl> <chr>
## 1  2004    3574      7600 No   
## 2  2005    5257      9800 No   
## 3  2006    6425     11400 No   
## 4  2007    6969     14300 No   
## 5  2008    7208     13600 No   
## 6  2009    6460     13900 No
p <- ggplot(data = yahoo, mapping = aes(x = Employees, y = Revenue))

p + geom_path(color = "gray80") +
    geom_text(aes(color = Mayer, label = Year), size = 3, fontface = "bold") +
    theme(legend.position = "bottom") +
    labs(color = "Mayer is CEO", x = "Employees", y = "Revenue (Millions)",
         title = "Yahoo Employees vs Revenues, 2004-2014") +
    scale_y_continuous(labels = scales::dollar) +
    scale_x_continuous(labels = scales::comma)

p <- ggplot(data = yahoo, mapping = aes(x = Year, y = Revenue/Employees))

p + geom_vline(xintercept = 2012) +
    geom_line(color = "gray60", size = 2) +
    annotate("text", x = 2013, y = 0.44, label = "Mayer becomes CEO", size = 3) +
    labs(x = "Year", y = "Revenue/Employees", title = "Yahoo Revenue to Employee Ratio, 2004-2014")

head(studebt)
## # A tibble: 6 x 4
##   Debt     type        pct Debtrc  
##   <ord>    <fct>     <int> <ord>   
## 1 Under $5 Borrowers    20 Under $5
## 2 $5-$10   Borrowers    17 $5-$10  
## 3 $10-$25  Borrowers    28 $10-$25 
## 4 $25-$50  Borrowers    19 $25-$50 
## 5 $50-$75  Borrowers     8 $50-$75 
## 6 $75-$100 Borrowers     3 $75-$100
studebt
## # A tibble: 16 x 4
##    Debt      type        pct Debtrc   
##    <ord>     <fct>     <int> <ord>    
##  1 Under $5  Borrowers    20 Under $5 
##  2 $5-$10    Borrowers    17 $5-$10   
##  3 $10-$25   Borrowers    28 $10-$25  
##  4 $25-$50   Borrowers    19 $25-$50  
##  5 $50-$75   Borrowers     8 $50-$75  
##  6 $75-$100  Borrowers     3 $75-$100 
##  7 $100-$200 Borrowers     4 $100-$200
##  8 Over $200 Borrowers     1 Over $200
##  9 Under $5  Balances      2 Under $5 
## 10 $5-$10    Balances      4 $5-$10   
## 11 $10-$25   Balances     15 $10-$25  
## 12 $25-$50   Balances     23 $25-$50  
## 13 $50-$75   Balances     16 $50-$75  
## 14 $75-$100  Balances     10 $75-$100 
## 15 $100-$200 Balances     19 $100-$200
## 16 Over $200 Balances     11 Over $200
p_xlab <- "Amount Owed, in thousands of Dollars"
p_title <- "Outstanding Student Loans"
p_subtitle <- "44 million borrowers owe a total of $1.3 trillion"
p_caption <- "Source: FRB NY"

f_labs <- c(`Borrowers` = "Percent of\nall Borrowers",
            `Balances` = "Percent of\nall Balances")

p <- ggplot(data = studebt, mapping = aes(x = Debt, y = pct/100, fill = type))

p + geom_bar(stat = "identity") +
    scale_fill_brewer(type = "qual", palette = "Dark2") +
    scale_y_continuous(labels = scales::percent) +
    guides(fill = FALSE) +
    theme(strip.text.x = element_text(face = "bold")) +
    labs(y = NULL, x = p_xlab, caption = p_caption, title = p_title, subtitle = p_subtitle) +
    facet_grid(~ type, labeller = as_labeller(f_labs)) +
    coord_flip()

library(viridis)

p <- ggplot(studebt, aes(y = pct/100, x = type, fill = Debtrc))

p + geom_bar(stat = "identity", color = "gray80") +
    scale_x_discrete(labels = as_labeller(f_labs)) +
    scale_y_continuous(labels = scales::percent) +
    scale_fill_viridis(discrete = TRUE) +
    guides(fill = guide_legend(reverse = TRUE,
                               title.position = "top", label.position = "bottom", keywidth = 3, nrow = 1)) +
    labs(x = NULL, y = NULL, fill = "Amount Owed, in thousands of dollars",
         caption = p_caption, title = p_title, subtitle = p_subtitle) +
    theme(legend.position = "top",
          axis.text.y = element_text(face = "bold", hjust = 1, size = 10),
          axis.ticks.length = unit(0, "cm"),
          panel.grid.major.y = element_blank()) +
    coord_flip()

Appendix

my_numbers <- c(1, 2, 3, 1, 3, 5, 25)
your_numbers <- c(5, 31, 71, 1, 3, 21, 6)

my_numbers[4]
## [1] 1
my_numbers[7]
## [1] 25
my_numbers[2:4]
## [1] 2 3 1
my_numbers[c(2,4)]
## [1] 2 1
my_tb <- tibble(mine = c(1, 4, 5, 8:11), yours = c(3, 20, 16, 34:31))
class(my_tb)
## [1] "tbl_df"     "tbl"        "data.frame"
my_tb
## # A tibble: 7 x 2
##    mine yours
##   <dbl> <dbl>
## 1     1     3
## 2     4    20
## 3     5    16
## 4     8    34
## 5     9    33
## 6    10    32
## 7    11    31
my_tb[3, 1]         # row 3 col 1
## # A tibble: 1 x 1
##    mine
##   <dbl>
## 1     5
my_tb[1, 2]         # row 1 col 2
## # A tibble: 1 x 1
##   yours
##   <dbl>
## 1     3
my_tb[3, "mine"]    # row 3 col 1
## # A tibble: 1 x 1
##    mine
##   <dbl>
## 1     5
my_tb[1, "yours"]   # row 1 col 2
## # A tibble: 1 x 1
##   yours
##   <dbl>
## 1     3
my_tb[, "mine"]     # all rows col 1
## # A tibble: 7 x 1
##    mine
##   <dbl>
## 1     1
## 2     4
## 3     5
## 4     8
## 5     9
## 6    10
## 7    11
my_tb[4, ]          # row 4 all cols
## # A tibble: 1 x 2
##    mine yours
##   <dbl> <dbl>
## 1     8    34
my_tb$mine
## [1]  1  4  5  8  9 10 11
out <- lm(mine ~ yours, data = my_tb)
out$coefficients
## (Intercept)       yours 
## -0.08011921  0.28734222
out$call
## lm(formula = mine ~ yours, data = my_tb)
out$qr$rank         # nested
## [1] 2
my_tb$ours <- my_tb$mine + my_tb$yours
my_tb
## # A tibble: 7 x 3
##    mine yours  ours
##   <dbl> <dbl> <dbl>
## 1     1     3     4
## 2     4    20    24
## 3     5    16    21
## 4     8    34    42
## 5     9    33    42
## 6    10    32    42
## 7    11    31    42
edu
## # A tibble: 366 x 11
##    age   sex    year total elem4 elem8   hs3   hs4 coll3 coll4 median
##    <chr> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 25-34 Male   2016 21845   116   468  1427  6386  6015  7432     NA
##  2 25-34 Male   2015 21427   166   488  1584  6198  5920  7071     NA
##  3 25-34 Male   2014 21217   151   512  1611  6323  5910  6710     NA
##  4 25-34 Male   2013 20816   161   582  1747  6058  5749  6519     NA
##  5 25-34 Male   2012 20464   161   579  1707  6127  5619  6270     NA
##  6 25-34 Male   2011 20985   190   657  1791  6444  5750  6151     NA
##  7 25-34 Male   2010 20689   186   641  1866  6458  5587  5951     NA
##  8 25-34 Male   2009 20440   184   695  1806  6495  5508  5752     NA
##  9 25-34 Male   2008 20210   172   714  1874  6356  5277  5816     NA
## 10 25-34 Male   2007 20024   246   757  1930  6361  5137  5593     NA
## # ... with 356 more rows
edu_t <- gather(data = edu, key = school, value = freq, elem4:coll4)

head(edu_t)
## # A tibble: 6 x 7
##   age   sex    year total median school  freq
##   <chr> <chr> <int> <int>  <dbl> <chr>  <dbl>
## 1 25-34 Male   2016 21845     NA elem4    116
## 2 25-34 Male   2015 21427     NA elem4    166
## 3 25-34 Male   2014 21217     NA elem4    151
## 4 25-34 Male   2013 20816     NA elem4    161
## 5 25-34 Male   2012 20464     NA elem4    161
## 6 25-34 Male   2011 20985     NA elem4    190
tail(edu_t)
## # A tibble: 6 x 7
##   age   sex     year total median school  freq
##   <chr> <chr>  <int> <int>  <dbl> <chr>  <dbl>
## 1 55>   Female  1959 16263    8.3 coll4    688
## 2 55>   Female  1957 15581    8.2 coll4    630
## 3 55>   Female  1952 13662    7.9 coll4    628
## 4 55>   Female  1950 13150    8.4 coll4    436
## 5 55>   Female  1947 11810    7.6 coll4    343
## 6 55>   Female  1940  9777    8.3 coll4    219
head(bad_date)
## # A tibble: 6 x 2
##   date       N
##   <chr>  <int>
## 1 9/1/11 44426
## 2 9/2/11 55112
## 3 9/3/11 19263
## 4 9/4/11 12330
## 5 9/5/11  8534
## 6 9/6/11 59490
p <- ggplot(data = bad_date, aes(x = date, y = N))
p + geom_line()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

bad_date2 <- rbind(bad_date, bad_date)

p <- ggplot(data = bad_date2, aes(x = date, y = N))
p + geom_line()

#install.packages('lubridate')
library('lubridate')
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
bad_date3 <- bad_date
bad_date3$date <- mdy(bad_date$date)
head(bad_date3)
## # A tibble: 6 x 2
##   date           N
##   <date>     <int>
## 1 2011-09-01 44426
## 2 2011-09-02 55112
## 3 2011-09-03 19263
## 4 2011-09-04 12330
## 5 2011-09-05  8534
## 6 2011-09-06 59490
p <- ggplot(data = bad_date3, aes(x = date, y = N))
p + geom_line()

url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"
bad_year <- read_csv(url)
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   country = col_character(),
##   world = col_character(),
##   opt = col_character(),
##   consent.law = col_character(),
##   consent.practice = col_character(),
##   consistent = col_character(),
##   ccode = col_character()
## )
## i Use `spec()` for the full column specifications.
bad_year %>% select(1:3) %>% sample_n(10)
## # A tibble: 10 x 3
##    country        year donors
##    <chr>         <dbl>  <dbl>
##  1 United States  2000   21.2
##  2 Belgium          NA   NA  
##  3 Switzerland    1996   12.6
##  4 Ireland        1997   20.9
##  5 Norway         2001   14.4
##  6 Australia      1994   10.2
##  7 Switzerland    1997   14.3
##  8 Italy          1997   11.6
##  9 France         1995   15.1
## 10 Belgium        1994   22.8
p <- ggplot(data = bad_year, aes(x = year, y = donors))
p + geom_point()
## Warning: Removed 34 rows containing missing values (geom_point).

bad_year$year <- int_to_year(bad_year$year)
bad_year %>% select(1:3)
## # A tibble: 238 x 3
##    country   year       donors
##    <chr>     <date>      <dbl>
##  1 Australia NA          NA   
##  2 Australia 1991-06-15  12.1 
##  3 Australia 1992-06-15  12.4 
##  4 Australia 1993-06-15  12.5 
##  5 Australia 1994-06-15  10.2 
##  6 Australia 1995-06-15  10.2 
##  7 Australia 1996-06-15  10.6 
##  8 Australia 1997-06-15  10.3 
##  9 Australia 1998-06-15  10.5 
## 10 Australia 1999-06-15   8.67
## # ... with 228 more rows
p <- ggplot(data = bad_year, aes(x = year, y = donors))
p + geom_point()
## Warning: Removed 34 rows containing missing values (geom_point).

add_xy <- function(x, y) {
    x + y
}

add_xy(x = 1, y = 7)
## [1] 8
add_xy(5, 2)
## [1] 7
plot_section <- function(section = "Culture", x = "Year", y = "Members", data = asasec, smooth = FALSE) {
    
    require(ggplot2)
    require(splines)
    
    p <- ggplot(subset(data, Sname == section), mapping = aes_string(x = x, y = y))     # aes_string() instead of aes()
    
    if(smooth == TRUE) {
        
        p0 <- p + geom_smooth(color = "#999999", size = 1.2, method = "lm", formula = y ~ ns(x, 3)) +
            scale_x_continuous(breaks = c(seq(2005, 2015, 4))) +
            labs(title = section)
        
    } else {
        
        p0 <- p + geom_line(color = "#E69F00", size = 1.2) +
            scale_x_continuous(breaks = c(seq(2005, 2015, 4))) +
            labs(title = section)
        
    }
    
    print(p0)
    
}

plot_section("Rationality")
## Loading required package: splines

plot_section("Sexualities", smooth = TRUE)

plot_section <- function(section = "Culture", x = "Year", y = "Members", data = asasec, smooth = FALSE, ...) {
    
    require(ggplot2)
    require(splines)
    
    p <- ggplot(subset(data, Sname == section), mapping = aes_string(x = x, y = y))     # aes_string() instead of aes()
    
    if(smooth == TRUE) {
        
        p0 <- p + geom_smooth(color = "#999999", size = 1.2, ...) +
            scale_x_continuous(breaks = c(seq(2005, 2015, 4))) +
            labs(title = section)
        
    } else {
        
        p0 <- p + geom_line(color = "#E69F00", size = 1.2) +
            scale_x_continuous(breaks = c(seq(2005, 2015, 4))) +
            labs(title = section)
        
    }
    
    print(p0)
    
}

plot_section("Comm/Urban", smooth = TRUE, method = "loess")
## `geom_smooth()` using formula 'y ~ x'

plot_section("Children", smooth = TRUE, method = "lm", formula = y ~ ns(x, 2))